A Quants Journal, Episode 4: ‘…and we´re in Business!’
It has been a while, so, where were we? Right, we conducted our first forecast using OLS and some weird reshuffling technique I came up with, which looked already quite promising. That's because we had a success rate which was higher than just picking stocks by chance, say per coin toss.
I managed to increase the Success Rate significantly since the last episode by adding another two explanatory variables. Many of you probably know all about the Capital Asset Pricing Model, short CAPM. For those who do not: The metric at the heart of this one-factor model which attempts to explain excess returns of an asset is called market beta. It measures the risk premium using this simple formula:
Or in code, computing the market beta for a time-horizon of one year (~256 working days):
It should identify assets that react stronger to economic swings than others. In theory, the more risk exposure the higher should the return be, so we would expect stocks with a high market beta to be more volatile but also earning us a higher risk premium.
The other added variable is called “Growth1Q” which naively tries to capture any momentum by looking at how the stock price performed in the past three months. Maybe this enables us to “hop on the hype train”. Cho Choo.
Without further to do, let´s look at the interesting results, meaning the performance of a portfolio that consists only of stocks that our algorithm suggested. We look at the one-year (256 working days) performance, normalizing the portfolio value to one at the beginning.
The naked data reads as follows: Both Portfolios beat the market over the course of one year at margins of ~8 percentage points. The reason we have to lines representing two different portfolios here is that the first model is the one maximizing the R2, the latter, called alternative, the Bayesian Information Criterion (BIC), another measure of a model’s quality. The Sharpe Ratio has been beaten by ~10. Not too bad.
But it´s not all that perfect as one can see from the graph. Unfortunately, during the pre-Covid period the portfolios respective performances were significantly less impressive. In fact, had I shortened the time-horizon to, say, 100 days, this would have been a disappointment. What I call “Relative Performance over the year” varies widely by model. The one maximizing R2 closes in 62.5% of trading days above the S&P 500, while the BIC model disappoints by “being worth less than the index more often than not”. Both have in common that they managed to pick stocks which recovered splendidly from the first covid-shock. Not great, not terrible.
For now, I believe that is about as good as it gets using OLS. We (mis)used the regression model to classify data by making up artificial hurdles so when an estimation passes a certain threshold it would be classified as a “hold” or “buy”. There are certainly better algorithms which are explicitly designed to sort data into categories, one of them would be the multinomial regression model. I will spend the next episode exploring said algorithm. Why? Because of this massive cliffhanger:
We are talking beating the market by over 10 percentage points, meaning 23% return and a relative performance of >75% YoY while doubling the Sharpe-Ratio.
Do I have your attention now?
The first little achievements using OLS were mildly encouraging, but honestly, I am more than thrilled now to experiment more with Multinomial Classification. This will include:
1. A lot of back testing
Applying the model to larger amount of historical data. Let ´s not forget that the past year has been more than extraordinary in many ways and might not be representative, meaning the results may not hold outside of a crisis.
2. Start to do proper Portfolio-Management
I think its foolish to just go “all in” and only use the recommended stocks, basically putting my capital entirely in the hands of this sometimes-stupid machine. Starting from the market portfolio, I would weigh the individual stocks in a more moderate way according to their respective forecast.
3. Do not worry too much about model selection after all
Now, hear me out. Last episode we talked about how terrible the model´s accuracy was, indicated by a very low R2. But how relevant is accuracy here after all? Let´s put it this way: If we wanted to find out what the individual contributions of the respective variables were, these models would be utterly useless. But honestly, I do not care that much about why the stock is recommended, I only care whether it is. Adding more and more variables with little explanatory power will inflate the model´s variance, sure, but they will not create a bias and hence not a big problem at this point.
A quick note on the reshuffling-trick I pulled off (running the regression algorithm only on 80% of the data repeatedly). I am by no means the first who came up with the idea to use a (linear) regression as his tool of choice. But among those who do, often they restrict themselves to certain types of industries to be included in the sample. Often for example, the financial sector is left out because ratios like RoE and RoA are so deviant here (think about it, a bank is all about investing other people’s money so of course the RoE is going to be insane). Surely it would be interesting to check for patterns here later.
Bottom Line: If you want to find out how to create an algorithm in figure 3, how Portfolio Management might further improve its performance and generally get rich quick, then you should, as always, stay tuned for the next episode of A Quants Journal.
Great to still have you with me!