• Bocconi Students Options Club

A Quants Journey, Episode 3: ‘Machine Learning at last’

After hours of data mining, let us return to the surface and into the lofty and exciting world of machine learning.

A quick recap: We gathered data for roughly five hundred firms over the course of two years and computed around a dozen common KPIs for each of them. Now, let us try to find out whether any of these KPIs have some ability to predict the companies excess stock return in the following period.

Today´s (and most likely also the next few episode’s ) schedule looks like this:

1. Investigation of the estimator for 2017: Start with a naïve approach (least squares)

2. Using the model from (1), trained with data from 2017, to forecast 2018

3. Finetuning (aka. model selection) for (1)

4. Decide over stock-picking strategy (e.g., Black-Litterman Model) to include new beliefs

5. Run a simulation like in Episode I and compare to the S&P 500

6. Repeat with various other (non-) parametric models (e.g., probit, cluster analysis)

If necessary:

  • More data mining to fuel the models with more information (e.g., Market-Beta, more dynamics such as asset growth, macroeconomic data, …)

This section of the rabbit hole we call quantitative finance is about as winding and branched as it gets, so let us jump in headfirst.

1. OLS

We start with the most basic concept in econometrics. In fact, it is so ordinary they decided to call it that way. Of course we are speaking of ordinary least squares, OLS in short. I am curious to find out how far it will be able to take us. Just a quick recap what an OLS regression is about in case you missed that statistics class.

Imagine your data looks something like this:

Our task in this example is to find out how the recent short squeeze is correlated with the amount of news and institutions talking down to retail Investors live on CNBC. On first glance, we can already make out a strong and positive relationship. OLS is quantifying this relationship by computing a linear trendline through this cloud of data points.

This linear function tells us, that with every dollar the share price is increasing, on average, 0.21 more condescending interviews talking about market manipulation and lack of fundamental analysis on the side of millennial retail investors take place. Let´s say, GME is worth $420 a share, our best prediction would assume 420 * 0.21 = 88.2 interviews. Everything clear? Then let´s look at our first regression output of Excess Stock Return on RoA, RoE, Sales Growth, Current Ratio, Earnings per Share, Price/Book Ratio and Debt/Equity Ratio for our entire sample:

What do make of that? The first thing to investigate is the significance of the estimates for each variable (we call them betas). Here it is best to have a look at the respective p-value for each beta, which indicates the amount of evidence against the hypothesis of insignificance. In other words, how sure can we be that there is an effect of, say, RoE on Excess Return which is not zero. The lower the p-value, the better. Rule of thumb: We can think of the variable having strong explanatory power if the p- value is lower than 5%.

In this regression, the Current Ratio (interestingly enough) seems to be very reliable, while RoE also does well while Sales can barely contribute anything here. The overall performance of the regression measured by the R2 = 0.03 is close to terrible.

Before I get to test this first model, let me elaborate on that quickly. One reason why a R2 can be very low is because there is collinearity in the matrix of explanatory variables. Confused? Think of it this way: When two variables like RoE and RoA behave very similarly, the least squares algorithm fails to figure out to which of the two he should attribute variation in Excess Returns. With KPIs in general, that can easily happen, since ultimately the most of them are constructed using the same building blocks . Thus, co-movement is quite likely. If a firm has a great profitability (measured by RoA), it is quite likely they also do well in terms of revenue and debt. What can we learn from that?

  • More variables do not necessarily mean a better model, especially when they are similar to each other. We need to look out for variables that are clearly distinct from one another by measuring different features of the company.

Now, let´s try out how well these estimated betas can explain the behavior of our test sample. What do I mean by test sample? I take the original sample of 405 firms and assign them to a larger training sample (80% of the data) and a smaller test sample (20%) to evaluate the accuracy.

“What on earth am I seeing here?” you may ask. What I did here was conducting 1000 OLS Regressions where the test-and-training split would be reshuffled with every iteration. For each regression, I computed the “Success Rate”. I defined “Success” by predicting the correct sign of the dependent variable, Excess Return in 2017.

This average rate would be roughly at 54.83%, marked by the red vertical line. So, the OLS Regression would – on average – be able to correctly predict whether a new stock would beat the market or not in 55 out of 100 cases in 2017. It is crucial that this ratio is above the 50% benchmark of flipping a coin as a predictive tool. And remember that this is just one possible measure of success, one could instead define success as being in the “right neighborhood” of say +- 3 percentage points.

Marked by the yellow lines you can see the lower and upper bound of a 2-sigma-Confidence Interval. In roughly 95% of all cases, we expect the true value to lie inside this interval. It would be desirable to have an interval not overlapping the 50% hurdle which could be achieved by increasing the number of trials. The question is not whether we can do that, the question is what would we get out of it? I think it became apparent, that on average the algorithm performs better than the coin. But just like the coin, we might have bad luck and end up with an estimation result at the left tail of the distribution above, since the regression is apparently very sensitive to its composition. Note: Check for outliers later causing these big swings, maybe we find a pattern here.

  • One more point I want to stress here is that we must abandon the idea of finding the perfect stock picker, we can only hope to beat the market on average.

This is all interesting and motivating but eventually worthless. Nobody is interested in getting an explanation why a stock performed the way it did when this explanation cannot be used to explain future movements as well. So, let´s take these betas to the test in the year 2018. Which betas? We have two options. First, the naïve one: Use all data from 2017, including potential outliers, and compute steady betas (figure 3). More data is always better, isn´t it? Generally speaking, yes, just think of the law of large numbers. Will it be a wise choice here, knowing that some outliers seem to have very strong influence on the estimate hinting that the sample is not i.i.d.? What if the same KPI for a retailer has a very different meaning than for a financial institution? Then maybe not, so let´s also try a different approach afterwards and compare the results.

2. The Naïve OLS Estimator for 2018

When we force the algorithm for 2018 to make a clear choice: “Yes or No, will stock x beat the market?”, we get a success rate of 0.53.

In absolute values, that is 29 more correct then false predictions.

How would a fair coin perform? The coin toss is Bernoulli distributed with p = 0.5, the probability of 217 successes in 405 throws is only 0.014, less than 1.5%.

That´s solid evidence that our algorithm is already superior to the coin toss.

If we allow the algorithm to stay neutral, he can even do better. Let´s not ask a prediction from him if his estimation is an excess return of less than +- 1 percentage point.

The idea here is that we only want a prediction if the algorithm is somewhat confident in either one or the other direction. This “degree of confidence” is depicted by tau in the code below.

Now, the success rate became 0.55 and the difference between correct and false guesses is 31.

Apparently, the state of the world in the year 2018 is about the same as in 2017. By that I mean that still the same rules apply to asset pricing. Whatever the machine has learned remains valid over time, at least for this sample.

3. The arguably-a-bit-less-naïve OLS Estimator for 2018

As we've seen, it does make quite a difference how we shuffle the training and test sets. My idea here was: Out of all the 1000 estimations, who all differ in quality, let me pick the best regression in terms of R2 and use these hopefully better-behaved betas for further prediction. And in fact, Regression #239 yields the maximum R2 of still miserable but less catastrophic 0.102, which is three times better than before! Outliers, who distort the outcome, are most likely not part of the 80% of the sample used here. Further investigation would imply experimenting with the share of stocks included and the number of iterations, but only if it is proved useful in the first place.

While in the restrictive case, where the algorithm was not allowed to stay neutral, nothing really changed, things look very differently under the non-restrictive regime.

Absolute difference between correct and false guesses = 34.

In fact, not only became the algorithm bolder in his decision making, yielding less “neutrals” for a tau of 1 percentage point, he also makes relatively fewer wrong predictions, increasing the difference to 34.


We have seen that - to my surprise - a first attempt using OLS for stock performance prediction yielded positive results. Would I bet my lifesavings and on this script? Probably not, I am too heavily invested in meme stocks already, but it motivates me to dig deeper.

Next episode I want to take a closer look at probit models, modelling the probability of a stock beating the market, and if that yields the expected improvements to OLS. Also, I would like to take some time to speak about model selection which unfortunately would have gone beyond the scope of today’s episode. Especially I want to see how we can improve with respect to R2 and if a higher R2 necessarily leads to more correct predictions after all. Also, I compared the prediction to a time horizon of one year. Maybe that´s suboptimal, maybe the model performs better for only one quarter. Again, another possible adjustment, another tunnel of the rabbit hole.

I feel like this project now really has gathered steam and at least I am very excited for the next episode. Keep the feedback and comments coming, they really do mean the world to me!

Thanks again for reading and see you soon for the next episode of A Quants Journey!

Stay tuned and most importantly, stay healthy