A Quants Journal, Episode 1: [hello, wall street]
We kick off this series by building a framework that will allow us to make our first trades. Today´s schedule looks as follows:
Get and prepare historical data to work with.
Set up a “trading account”: A dataframe that keeps track of our portfolio and cash positions.
Create a function that makes systematic buy/sell/hold decisions.
Run the algorithm for one year, see what happens and whether everything works (bonus points if the bot doesn´t operate at a net loss)
In this first step, from the library yfinance I will use the therein included function get_data_yahoo().The function is quite convenient as it takes only three simple arguments (ticker, start- and end-date) and returns daily data for the said stock over the desired period. I figured a good start would be to get the ten years of data for the S&P 500. The tickers I get from Wikipedia using
For now, I am only interested in the closing prices of the stocks for each day. Specifically, I am interested in their Log Returns, the stocks relative performance from one day to another.
These daily log returns are easily computed by
Why use log returns and not linear returns? The problem with the latter is the following: Imagine a fund of 1 Mio. €. First it loses 500.000€ (-50%) just to gain it back the next day (+100%). Obviously, nothing in terms of value has changed, but the fonds manager might be inclined to sell this as a success story since, on average, he generated (-50% + 100%)/2 per period.
Log Returns solve this problem,
while still yielding the same results for (small) changes in the prices (since the linear return function is its first-order Taylor Expansion at x=0 ).
This is done rather quickly. All I need is a dataframe (normal people call it table) that keeps track of the number of shares for every stock my bot gets to play with. Also, I need a variable that later serves as the algorithm's (cash) balance to account for in- and outgoing cash flows.
Since modesty is my middle name, I gifted myself a starting portfolio worth $26k of amazon stock.
Note, that I could add any other title by adding the ticker to the list 'stock'.
Now comes the exciting part. I need to implement some sort of function that would first analyze the data known up to this point in time and subsequently make a buy/hold/sell-decision based upon it.
Now, the only information my little algorithm must work with are past log returns which, admittedly, puts him in a tight spot. Luckily for him, I am an economist and as such I love to simplify the world by making far-fetched assumptions. I make the bold claim that:
On average, returns are positive for the stocks that are being traded.
Returns are distributed somewhat bell-shaped, but most importantly symmetrically, around the mean.
If that sounds cryptic to you: Do not worry, visualization is coming.
Now here´s the deal: What if we bought stocks when they were relatively cheap and sell them when they are expensive since returns will in the long run regress to the mean? Ingenious, I know.
But what makes a stock “relatively cheap or expensive”? One vague definition is that it has experienced “unusually strong and stable movement into one direction” lately. In other words, the prices seem to go either up or down because they have done so in the past hours, days or weeks. Traders proudly call these trends “momentum”. I will compute the momentum by taking a weighted average of the past twenty daily observations of log returns. If these are above a certain threshold, I consider the stock as expensive and expect it to fall soon, while a negative trend allows me to “buy the dip”.
If returns are approximately normally distributed, then should also their weighted (and thus smoothed) averages be. In the graph to the right, you can see the distribution of said averages, having a positive mean of 0.26% and a standard deviation of 1.23%. So far, the assumptions (positive mean and symmetry) hold (but I doubt they would for any, say, airline in 2020).
Now let´s think about thresholds. I need a lower bound, so the algorithm buys stocks whenever the momentum falls below it as well as an upper bound after which he sells. If their respective distance to the mean is equal, e.g., one standard deviation, by the law of large numbers, my little algorithm should in the end have bought just as many stocks as he has sold.
Well, actually, I placed the upper bound far above that, making sell-decision five times less likely than buys. But I also increased the numbers of stocks being sold by factor five for compensation. To estimate the corresponding probabilities, I used the empirical density function of the smoothed returns. The intuition behind this asymmetry in the traded quantities is to avoid hurried profit taking while also reducing the danger of holding too much money in the savings account. That would be a problem, since the opportunity costs associated with holding cash is the single biggest reason why people invest their money in the first place.
That was a lot of text with a lot of talk about statistics. Maybe we turn to something more tangible like the question: What was the algorithm´s performance compared to the market? I ran the code from October 2019 to October 2020, let´s look at the result:
Here are the facts: On October 14th, the algorithm has beaten the market by $3809 (or 14 percentage points) by making 54 orders, yielding an average return of $70 per trade. During the year, for an aggregate of 237 (19) days, the portfolio value was higher (lower) than the benchmark. Sharpe-Ratio-wise, the algorithm also beat the market (.12 vs. .12) while exhibiting slightly higher volatility.
For the most time, the bot does not do much, because the market is moving sideways. But when the market is volatile the algorithm amplifies the effects (explaining the increased volatility) as one can see in the graph above.
Needless to say, I can´t help but feel some pride for this brave little trader of mine. He has done his first steps and did not fall off a cliff. But to be honest, this has more to do with the family-friendly environment I exposed him to than anything else.
What are the crucial limitations of this result? First, let us look at the cash balance.
Interestingly, the algorithm is still reluctant to spend its money, leading to at times very high inefficiencies. But this would not necessarily be a problem, surely one could program him to invest excess cash in a safe haven or something of the like. The bigger problem here is that he also takes up credit, up to 20% of the original portfolio value. This leverage comes at a price and I am feeling it hard to believe that this toddler of an algorithm could convince a bank manager (or his investors) to lend him more money in the middle of, say, the COVID-Crisis.
Anyhow, while these problems seem manageable, the most critical part of its decision-making is the optimal threshold. This threshold is, as discussed, symmetric around its known mean. With this framing, the problem becomes very apparent: How is the algorithm supposed to know the true mean in advance?
One could of course use the means of past observations as a proxy, but this is an especially dangerous game if the traded stock´s value does not follow the assumed trend. How is that? If my estimate of the mean is too low (I underestimate the stock's future performance) the bot will make disproportionally many sell-decisions. That way, those successful stocks will be underrepresented in my portfolio. If my estimate is too high (I overestimate future performance) I constantly buy the dip and rarely sell, so my portfolio eventually becomes “the dip”. The only way to not lose money is to “hit bullseye”, because any deviation from the mean is being punished.
In other words, after a while the portfolio inevitably becomes a shelter and gathering point for the underperforming and battered. So far, my bot makes a better social worker than a stock picker.
So, what is today´s bottom line? Considering that I am only at stage one of the grand masterplan, this has been quite a success. I established a simple yet so-far functional framework, that allowed me to play around with a first strategy. Unfortunately, I cannot award myself the bonus points, making it more of a B+ in the end. But working even on this simple mechanism gave me plenty of new ideas and inspiration for extensions, upgrades or altogether new approaches. In the next episode I want to look at forecasting performance using the firm’s financials in a logistic regression. I usually detest the concept of wish lists, but this year all I want for Christmas is big data. So, if Santa or one of the readers has a good source for said data (I am thinking historical KPIs or, even better, entire quarterly reports for all the S&P 500), that would be a highly appreciated early Christmas gift.
Speaking of Christmas and forecasting: I will not promise that episode 2 will be out before the holidays since exams are creeping up here in Italy. If it does get published earlier, this means that I am most likely procrastinating while I should study.
Let me finish with a big thank you to all who follow this journal, I did not expect such positive feedback and interest right off the start. If you have any suggestions or questions for this project, please let me know. I hope you all will be back for the next episode of A Quants Journal when we will investigate automated stock picking. Stay tuned and most importantly, stay healthy,