An Opine on Optimization: The Alpha Predator™ Problem

 

By: Blockforce Capital Research

In our previous post [APM I], we unveiled the Alpha PredatorTM Model (APM) for systematic digital asset trading; capitalizing on complexity and flexibility in order to capture alpha and outperform markets. However, despite the Alpha Predator being a multi-dimensional phantasmagorical predator, it is simultaneously a benevolent beast looking to spread its alpha proliferation across the fertile remains of the improved market ecosystem left in its wake. Using these alpha seeds to grow the blockchain sector for investors of all types, we must first invoke machine learning algorithms to optimize the best path toward germination.

For a quick reminder, here’s the APM’s decision tree we have shown in APM I:

stupid 1.jpg

We have many choices to make to determine how APM makes decisions, branching from the root (market direction) at the top, to the leaves (the Stochastic Oscillators) at the bottom. In its current form, it is these stochastic oscillators [learn more here, and here] that ultimately make the trading decisions and we would like to optimize their parameters to make the best possible trading decisions.

Technically speaking, APM has dynamical parameters known in physics as degrees of freedom, and we need to find the optimal choice for them. Each degree of freedom is one independent direction in which the system can move, and optimal is what gives us the highest investment return at a controlled level of risk. We need to find the delicate balance at which everything hangs together in just the right way to achieve this. We search the parameter space and test performance by running backtests, which apply the APM to historical price data and measure its performance.

Let’s do a quick count of degrees of freedom based on the current version of APM depicted in the graphic:

For the Market Diagnostics, we have Direction, Volatility, and Spread. In another future post, we will also talk about the details of these diagnostics. For now, we count their parameters: Direction has five, Volatility takes four, and Spread takes four as well, for a total of 13 degrees of freedom. A typical parameter is the choice of the trading period underlying a diagnostic. This period could be 30 minutes. The other parameters are then often integer multiples of that period over which all computations are performed. Additionally, they could also be threshold values above or below which certain conditions prevail.

13 Degrees of Freedom, 10 Trillion Possibilities, and the Age of the Universe

Let’s stay with diagnostics for a moment and do some counting. We want to find parameter choices that give us the best possible description of the market. We can do this -- in principle -- by trial and error. Let’s say for each of the Diagnostics’ parameters we want to try 10 different values to see where this leads. We could then try all the possible combinations and keep track of the best. Let’s count where this gets us:

Direction has 5 parameters with 10 choices each. That comes out to 10 x 10 x 10 x 10 x 10 = 105 = 100,000 different tries. Not too bad. The Volatility parameters with ten choices each get us to 10,000. Same for Spread. Again, this does not sound too bad.

However, here’s the catch: Direction, Volatility, and Spread work in concert to determine APM’s decision tree. So, we are dealing with 100,000 x 10,000 x 10,000 = 1013, or in words, 10 trillion different choices we would have to try!

One should never underestimate the sheer enormity when dealing with trillions. For example, one trillion dollar bills laid end-to-end would exceed the distance between the Earth and the Sun, and 10 trillion is one estimate for the total number of galaxies in the Universe [learn more here]. For Alpha Predator, with these 10 trillion choices to choose from, if each calculation takes 10 seconds and we use a single computer, we would be at it for a bit more than 3 million years. Moreover, that is just the Market Diagnostics.

Progressing along the decision tree, each branch leads to 18 different leaves and on each leaf there is a Stochastic Oscillator that makes trading decisions. We typically have four parameters for each Stochastic Oscillator. Luckily, each Oscillator is independent of the others, so adding the 18 leaves to the problem makes it “only” 18,000 times harder: 54 billion years for trying out all possible parameters of the APM decision tree.

54.8 billion years is longer than the age of the Universe (13.8 billion years)! Although defined by our refined patient temperaments, we are limited here by the age of the Universe, so we do not have a choice, we are going to need a better way to optimize APM much, much, sooner.

Machine Learning via Bayesian Optimization

Luckily, our team of computational and data scientists has a strong background in machine learning. That is the burgeoning scientific and engineering discipline that leverages the power of modern computers and advanced mathematical modeling techniques to let computer algorithms learn and make well-informed decisions.

Most machine learning algorithms have unknown parameters that need to be adjusted in the best way possible. The best way possible is defined as some loss function (see Fig. 1), which you want to minimize. In our case, this may be the negative of the cumulative returns. Thus, in order to minimize the loss function, we need to optimize the parameters. However, when dealing with non-differentiable phantasmagorical beasts, we need to invoke a particular method of machine learning known as Bayesian optimization.

Bayesian optimization is a little different than standard routines like Newton’s Method or Limited-memory BFGS because it is a machine learning approach to optimization. The reason for using it instead of something more standard is because APM is a weird function. Given the data, we are optimizing APM’s parameters to maximize returns. It doesn't lend itself to more common optimization approaches because it takes a very long time to run (minutes, compared to seconds in the traditional approach), and you can’t take the gradient of the original function (you really need the gradient in order to take small steps to the max/min). Bayesian optimization constructs this surrogate function, which is smooth and differentiable. Constructing the surrogate function is the machine learning part of it. Traversing the surrogate function to the minimum is the optimization.

Example Surrogate Surface

stupid 2.png


Fig. 1 An example two dimensional surrogate surface. The smallest loss is obtained at the minimum of the surrogate surface -- the darkest blue region where: 𝜃_1 = -2 and 𝜃_2 = 2.

Here’s how it works. Mathematically speaking, the result (e.g., total return, maximum drawdown [learn more here, and here] or any other performance metric of your choice) of an APM backtest is a function f(x) of variables x = { x1, x2, x3, x4, … } representing APM’s many different degrees of freedom. For simplicity, let’s stick with Total Return = f(x). We want to maximize total returns by finding parameters x that yield the global maximum of f(x).

The reason Bayes Optimization works for this problem relies on two assumptions: there is some level of smoothness in the parameter space, and that level of smoothness is isotropic (being the same everywhere). The amount of smoothness, or "level of similarity" between neighboring regions of parameter space is learned during the optimization process so that the first assumption can be verified to some degree. The next point in the parameter space is chosen by a balance of two goals, exploration and maximization. The surrogate function is dominated by uncertainty in regions of the parameter space that haven't been explored. Sometimes it will pick a point nearby that increases f(x), sometimes it will try a point in a region that hasn't been explored yet.

As input to the optimization process, we need to provide the possible range of values each of the x parameters may take. For example, we may decide that we want to average an indicator quantity over n market periods, where we might let n vary from 100 to 300. So our range for this parameter is 100 to 300. We define a range for all parameters. That gives us a high-dimensional hypercube defining the space in which the optimizer will operate. Additionally, we can place constraints on relationships between parameters. For example, we can enforce that one parameter always is smaller than another, which is essential when working with a long-term and a short-term moving average.

We prepare a Bayesian optimization run by sampling parameter values x from the hypercube so that all regions of the hypercube are represented. We then run APM to compute f(x). This sample gives the optimizer an initial set of x and f(x) with which to work. Again, our goal is to find the value of x that maximizes f(x).

Next, based on the initial set of x and f(x), the optimizer builds a statistical model that approximately describes how f depends on x. That is called a surrogate model. The surrogate model is then used to predict a new value of x that has a high probability of increasing f(x). APM is run with this new x, and the resulting f(x) is recorded and used in the next step.

Initially, the predictions are not very accurate, but the process is repeated. With each repetition, more information is added to the surrogate model, and it improves.

It learns!

In this way, we can find optimal parameters for APM with only a few hundreds of evaluations of f(x) compared to the trillions needed if we were using a simple, brute-force approach without machine learning.

Bayesian optimization is a robust technique, so it is not surprising that it shows up in many fields of science and engineering. For example, Bayesian optimization has been used to optimize setups for experimental studies and for finding the best parameters of predictive models for fluid dynamics or geophysics. Our work with APM shows one application of how Bayesian optimization can be a powerful tool in quantitative finance.

Conclusion

As the predator slowly skulks away, behind the budding alpha proliferation garden it has been patiently cultivating, we are afforded a moment to catch our breath and contemplate what we learned. Using the machine learning technique of Bayesian optimization, we can minimize the loss function of the mathematical surrogate surface that the alpha predator model spans in parameter space, allowing us to optimize our investment returns at controlled levels of risk. This optimization and implementation of APM are only one part of our overall Alpha Predator Quantitative Ecosystem at Blockforce Capital, in which we also utilize arbitrage and our proprietary Token Rotation model [both to be discussed in future articles].

In the meantime, watch your step as you leave the garden, for the predator lurks.



Blockforce Capital does not intend for the information presented herein serve as the basis of any investment decision. The information is given in summary form and does not purport to be complete. The sole purpose of this material is to inform, and in no way is intended to be an offer or solicitation to purchase or sell any security, other investment or services, or to attract any funds or deposits, nor shall any securities be offered or sold to any person in any jurisdiction in which an offer, solicitation, purchase, or sale would be unlawful under the laws of such jurisdiction. Any such offer would only be made to qualifying accredited investors by means of formal offering documents, the terms of which would govern in all respects. This information does not constitute general or personal investment advice or take into account the individual financial circumstances or investment objectives, or financial conditions of the individuals who read it. You are cautioned against using this information as the basis for making a decision to purchase any security. Past performance does not guarantee future results.

 
Kasey Price