Friday, April 7, 2017

How FindingThe Best Stocks is A Needle in The Haystack Problem

Finding the best stocks in certain ways is like finding a needle in the haystack. I don't mean with regards to difficulty necessarily but procedure. Awhile ago I was watching a show called Mythbusters: The Search in which they had contestants compete to become a part of the next generation's "Mythbuster's group". One of the problems that they had to solve for was how to find a needle in the haystack. Contestants invented ways of burning the hay that would not burn the needle, using magnates that would work on the needle but not the hay, using water in which the hay would float and the needle would sink and other ideas en route to attempting to filter out the noise.

One of the more eccentric contestants named Hackett suggested that this was a "signal and noise problem". He explained that you had a lot of noise (hay) and had to filter it out without also filtering out the signal (the needle).

I realized that stock picking methods involved mostly the same process. A lot of people like to look at only just the winners and seeing what they have in common. The problem with that is you might be very effective at identifying traits that are also true with losing stocks. Even if 75% of winning stocks have a certain characteristic, if 90% of all stocks share that trait then the trait in itself is not useful. Another problem is that even if you can compare it to the baseline and say that 90% of big winners paid no dividend and 75% of stocks that made big moves were under $10 per share and 60% of stocks that made big moves were under a $1B market cap--and even if you can also say that these big winners percentages are higher than the baseline rate for each stat---it is still possible in combination that this isn't the best match of filters. For example, perhaps although a small percentage of all stocks pay no dividend, 92% of $10 stocks with a small market cap pay no dividend. You'd have to decide which filter is more important.

Nevertheless, as long as you set up a way to objectively find out how to increase your hit rate of capturing big winners while reducing your chance of getting a non winner and still providing enough opportunities you are being productive in your methodology.

I want to focus on methods not stocks and not markets here. You could look at quarterly performance or yearly performance... but for an example I"m going to assume daily performance as you can run this every day and build a large sample size of what works and also plot some sort of measure of market condition in terms of trend, breadth and volatility just in case you notice certain scans work conclusively better in certain conditions.

So here is the method:
1)Develop a scan from a universe of stocks (say 5,500 stocks as an example)
2)Develop scan A to seek stocks up more than 1% and scan B to seek stocks up more than 4% in a single day (or 3% or 5% or whatever you prefer).
3)Either subtract the results or come up with a scan to seek stocks NOT up more than 1% and 4%.
4)Determine the baseline "hit rate" or percentage of all stocks that are up 1% vs up 4%.
This establishes a baseline. Let's imagine that out of 5,500 stocks 400 of them are up 1% or more and 150 of them are up 4%.

Eliminating The Noise And Burning The Hay

Now you are going to come up with a variety of scans you may like to use of what happened prior to today and not including today such as a stock was oversold 3 days ago or a stock was down 3 days in a row until today. Or perhaps you come up with a bollinger band squeeze or a scan that a stock over last several days prior to the breakout closed lower but not too much lower and never made a move more than 1%. Come up with at least a few methods between consolidating volatility, oversold, breakouts, trends, etc.


1)Develop these scans.
2)Scan them from all stocks to see the total number of stocks that pass the filter (say 500 stocks for a particular scan).
3)Add into a version of one of these scans that they pass this scan AND are up more than 1% and 4% or sort by today's change and count them (say 60 stocks in the scan are up 1% or more and 30 of them are up 4%)
4)Determine if randomly selecting a stock from this scan is better than randomly selecting from all stocks in terms of hit percentage and if so how much?
Example: In the example 400/5500 stocks or 7.27% are up 1%+. 150/5500 or 2.72% are up 4%+. In the example scan 60/500 are up 1% or 12%. And 30/500 or 6% are up 4% or more. This is a clear increase in your success rate over picking stocks randomly, so the filter added value today.

Repeat this process
I would run multiple scans and filters and even multiple "universes" from which you run the scan. Is scanning just the russel 3,000 going to produce a better hit rate than all stocks? What if you create a list of stocks that IPO'd in the last 5 years or less vs more than 5 years? What about stocks with positive earnings vs negative? What about stocks with accelerating earnings growth (growth % greater than last year) vs decelerating? What if you create a list of stocks that have been out of favor at one point or another and peaked in 2015 or earlier and are still down from their peak vs those within 15% of a prior peak in any year before 2015 (possible multiyear breakouts from ranges) or a list of stocks with positive uptrend or oversold on a longer term basis? This would be separate from the scan and it needs to have a pretty large number of names on that list to be statistically valid.  You can remove the results that would have stopped out if you want as well.

You can improve the results by identifying the samples most representative of the current market and sticking to methods that worked in the past under those EXACT same conditions.

This may seem like a lot of work but once it's done you can have confidence in the process and selection method's. Once you have collected samples over various periods and begin to determine the best scans to trade from, you may even develop the habit of reviewing the lists and developing your own personal skill or eye for identifying unique features that are hard to quantify in a scan or algorithm and begin to only focus on the best names from the list according to your experience.

If you'd like to shorten the time you have to wait to do this process you can set up the 4% movers for yesterday and just set up more scans and just keep running it for each day before today of how the scan would have came out. It's more work but you can do this all at once if you'd like. Or perhaps you'd like to do all of this over the weekend for each day of the week rather than once after close every day. Or perhaps you'd like to play for weekly moves or monthly moves of larger amounts rather than 1 day moves.

More Accurate Modeling
A more accurate way to model how your portfolio will result over a period assuming certain conditions is to actually look at 5 different distributions of outcome based upon the scan rather than just the success rate. You can sell at the stop, sell at the target, sell break even, sell below the stop or sell above the target. Using spreadsheets and getting a little app that runs monte carlo simulations on random numbers, you can set it up so that a range of random numbers between zero and one represents an outcome consistent with the actual results. For example if 40% of stocks stop out at the stop rather than gaping down or gaping down you'd set it up so that a number less than or equal to 0.40 results in a stop out and set up 5 different cells that determine 1 for the outcome and zero for no outcome and then translates that to a result for a single trade. Trying to plot simultaneous trades is more difficult particularly if your outcomes have any degree of correlation (they will) so modeling your entire portfolio of multiple trades with simultaneous holding periods and different entry and exit times that overlap is really hard to program for me. But you can do a simplified version of perhaps ALWAYS holding 10 stocks so that way your results over a period of 4 trade periods doesn't allow you to compound more than 4 times of the sum of all 10 results for each period.

This can be set up with rules such as if the result of a particular trading year or series of 100 or 1000 trades is over a certain amount to trigger a 1 otherwise a zero. Then a montecarlo simulation determining the average will give you the probability that that outcome is reached. SO you can determine a particular allocation and risk % how many series of trade results in obtaining your goal of minimum (what percentage of these periods are not down more than 20%) and maximum thresholds (what percentage of these periods are up more than 50%). Or you can look at the total distribution of all simulated outcomes at the end of a period.

I'm planning on trying to do a similar process at some point but I'm working on setting it up. This post serves as me getting the logic down so I have an idea of how to run it but now I actually have to set up the scans and process. I plan to track it on an excel spreadsheet which I have not yet build yet. I've attempted such a feat for quarterly results and some fundamentals, but those results were based upon recent data rather than the data BEFORE the stock made their move so it may not be accurate.