The usage of algorithm 'random forest' in tradingBarry Copeland 25 / February / 18 Visitors: 484
A "random forest". How to use this algorithm in trading?
“Random forest” is one very popular modeling algorithm when programming on a computer, for example, in such high-level languages as Python. This amazing machine learning algorithm was invented by a couple of programmers Leo Braiman and Hell. Cutler in the second half of the 20th century. Now he has not changed much, which speaks of his high class and amazing versatility. It is used when creating trading robots and when trading according to the algorithm.
Indeed, "random forest" is one of the few universal modeling algorithms.
It’s universal because
1) it is applicable in solving various problems, this is about 70% of all practical programming tasks (except for tasks containing images);
2) because there are versions of “random forest” for classifications, regressions, for building clusters, when searching for anomalies, if necessary, selection by attributes and others.
A “random forest” is a set of crucial estimators or estimates. When solving regressions, all answers are averaged, and in classifications they decide based on a majority vote.
All trees (estimates) are built independently from each other according to the following scheme:
• Random subsamples are built in the size of samplesize from the general training sample (they can be returned) - each build their own tree (By the way, this is the general name of the random forest algorithm).
• When constructing each association in the trees, look at the max_feature (maximum characteristic) of the random attribute (each new association has its own random attribute).
• The best attribute and its association (with a predefined criterion) are selected. Trees are usually built until the moment the selection is exhausted (or until representatives of only one class remain among the "leaves" (signs)).
• Now in modern modifications, a parameter is provided that can limit the size (height) of the “trees”, the number of objects of the “leaves” attributes, and the total number of objects of the subsample by which it is constructed.
Of course, with a large number of trees (subsamples) in the "random forest", the quality of the solution of the entire algorithm will be higher, however, the duration of the RF and its settings also increases proportionally. Also, the quality of the final solution directly depends on many decisive estimators or estimates. Theoretically, it can be 100%. However, in order not to increase the duration of data processing in the model, you can figure out how many trees will be enough.
In trading, when using strategies based on the “random forest” algorithm, the above schemes can be used to analyze an additional parameter that combines some of the most important characteristics for a trader.
So, Stanford graduate trader David Montag created using the "random forest" algorithm a rather successful trading strategy on the futures market, analyzing with it historical trading data for a certain period. The above is a graph of Montag's strategy performance, with a profitability of 6.63% and a volatility of 5.58% with a fairly high Sharpe ratio of 1.18. (see picture above)
With the help of "random forest" the popularity of machine learning is increasing.