Backtesting has fit a model to a set of data. Optimization has tried many combinations of rules and parameters. The results look good. The result at the top of the optimization report is best.
While there is always a best, we do not know whether this particular best is good enough. We need a technique and some metrics to estimate the risk and profit potential of trading the system. The validation phase will help us learn about the predictive ability of the trading system.
The scientific method requires that the system be tested using new data. Validation data. Out-of-sample data. Data that has not been used before. Time series data needs special attention because the order of the data points must be handled properly. Financial time series need special attention because profitable trading of a system removes some of the inefficiency that model was designed to identify. Over time, what were once profitable signals become more difficult to identify and less profitable to trade.
My wife and I have several raised bed garden plots at our home. Some have permanent crops of asparagus and raspberries. Some are replanted annually with tomatoes, peppers, squash, beans, peas. Each winter we read magazines and gather catalogs in a search for ideas, seeds, and starts for the next season. As spring warms the soil, we plant; as crops mature, we harvest and eat.
Everything looks good in the catalog. But we cannot eat that. Backtesting trading systems is to trading system development what reading the catalogs is to gardening — a search for good ideas. Validation is to trading system development what eating fresh produce is to gardening — evidence that the choices made were good ones.
The gold standard of validation for traditional trading systems is the walk forward process. Walk forward validation can also be used with machine learning systems. But machine learning has another option for validation (cross-validation) while traditional systems have only walk forward. So we will cover the walk forward technique here, and refer back to this article when we discuss validation of machine learning systems.
There are two periods of time and their associated sequences of data associated with fitting time series data. The model is fit to the data in the in-sample (IS) period using backtesting and optimization. After fitting, the model is tested using the out-of-sample (OOS) data.
It is particularly important that you have confidence in your objective function. The fitting to the in-sample data is done through an automated optimization. For each of the many combinations of rules and parameters that are tested, the objective function score is noted. After all combinations have been tested, the results are sorted into order by the objective function score. The single combination that scored highest is deemed to be the best, and its rules and parameters are used to test the OOS period. You will not have an opportunity to examine the list of results. You must be confident that the one test scored as best is actually the best according to your personal preferences — or at very least you must be confident that whatever the top ranked model is, you are satisfied it is acceptable.
It is important that the out-of-sample (OOS) period and data is both more recent than the in-sample period and has not been used before.
For best results, the distribution of the signals and trades must remain stationary through the total length of the two periods. Eventually, as the system passes validation and is moved to trading, the out-of-sample period is a live trading period. In order for the model to continue to signal profitable trades in both validation and in the live trading, there must not be serious distributional drift beyond the in-sample period. It typically takes some trial and error to determine the length of time a distribution is stationary. Test that before beginning the walk forward, then decide on the length of time for each period.
The walk forward process itself is a sequence of several steps, each fitting the model to the in-sample data, then testing it on the out-of-sample data. After each step, the beginning and ending dates of both periods are walked forward by the length of the OOS period.
As described earlier, the trading results over the in-sample have no value in estimating future performance. Rather, concatenate all of the OOS trades together and use them. That set of trades is the best estimate of future performance of this system.
What You Can Trust
One of the most serious difficulties we all have is avoiding contamination of the out-of-sample data. You all understand that backtest results have no value in estimating future performance, so you divide the data into in-sample and out-of-sample, fit using the in-sample, and validate using the out-of-sample. Always, it seems, a review of the OOS results shows a period of poor performance you would like to eliminate. You think of a change to the model that will take care of that, change the rules and parameters, and reoptimize — all in-sample. Another validation using the out-of-sample data shows the problem has disappeared. Even though all of the fitting was done using the in-sample data, you have made a change to the system based on out-of-sample test results. And in the process have contaminated the out-of-sample data. The boundary between the in-sample and out-of-sample data is the boundary between the past and the future. When you used a result from the out-of-sample test, you have created a future leak. You have used data from the future that you would not have available in real time. You need a new out-of-sample period for a new validation.
The same situation occurs when the results of a walk forward run are examined, the code changed, and the walk forward repeated. The contamination may not be as serious because there are several time periods in the walk forward results and the best parameters were chosen by the objective function. But you do not know how serious the contamination is until additional out-of-sample results have been received.
If you have done the development work yourself, you know how rigorous the validation has been. But what if the development work was done by someone else, and an attractive equity curve is presented with the assurance that it is truly out-of-sample? There is no way for any of us to tell whether any result we did not do ourselves has followed high quality technique and is safe to trade. You have three choices:
- Replicate the development and validation.
- Shadow trade the system and determine whether the shadow trades fit the distribution that the results you have been shown imply.
- Trade the system live and let the market perform the validation for you. Which it will gladly do for a small fee.
If your interest is exclusively in indicator-based models, go on to Trading Management
Back to Systems Development