This page is a “Table of Contents” for the series of articles that discuss aspects of trading system development.
Here is a short menu with links directly to each article. Or, scroll down through a series of short introductions to the individual articles.
There is a goal common to all traders:
To have confidence that the signals generated by the trading system precede trades that provide rewards adequate to compensate for the risk.
- The key word is confidence.
- The primary limitation is risk.
The process we will use closely follows the scientific method.
Trade Quality — Metrics
During development of a trading system we fit the model to the in-sample training data. Parameters are adjusted, each unique set of parameters defining an associated model. We are searching for the model that best fits the data. Best is defined by a formula — an objective function computed using the metrics associated with the trades, such as percent of trades that are winners, ratio of average win to average loss, maximum drawdown, etc.
One of the general principles of modeling and forecasting is that the out-of-sample data that will later be processed by the fitted model is similar to the training data. Specifically to technical analysis, profitable systems depend on the future resembling the past. That is, that when the model is applied to new data the trade results will have the same characteristics as those found in the training data during development. Before going further, we need to understand three important concepts:
- population versus sample
Risk and Reward — safe-f and CAR25
A trading savvy friend asked a pertinent question recently. She had some funds available and was considering three alternatives. She had tested them using good modeling and simulation technique and all three seemed safe. Safe in the sense that all three traded liquid issues and out-of-sample results were similar to in-sample backtests. The profit potential for the three were 20% per year, 10% per year, and 2% per year.
She recognized that the one that offerred 2% was probably her “risk free” alternative. It would be the place to park funds when no other system needed them for an active position. It was not a future, and there were no options, so there was no leverage available to raise the annual return.
As we talked about the 10% and the 20% systems, she said that she would be fine trading any system that had a risk within her tolerance. She described her tolerance as wanting to keep drawdowns limited — no more than 10% from highest equity. We agreed that there is always some chance of a “flash crash.”
Impulse versus State Signals
Impulse signals are Buy and Sell signals that occur at the beginning and end of trades. They often occur for a single bar, such as the bar where an indicator crosses a critical level, or when one of the trading rules is satisfied. Impulse signals are the default type of signal for most traditional trading system development platforms.
State signals have a value for every bar. The signal tells the trader what the position should be for the next period. In a system that trades a single stock and alternates between long and flat positions, there are two states: I will refer to them as beLong and beFlat. State signals can be used with traditional platforms and are always used with machine learning.
The data we use is time series. Each data point represents a single price or a bar of data. The single price could be a quotation or a trade — a tick. The bar could be of any length of time and has prices (usually for trades) for any or all of open, high, low, close, volume. Importantly, the data is in time order when it arrives at your computer, and is processed in time order by your program.
Data vendors are a transitory group of businesses. Verify that the data vendor you plan to use is still in business, providing the service you want, with an acceptable API to your trading platform. The chart compares several with whom I have had recent experience and recommend.
Beware. There are vendors whose data is …
In traditional trading systems, they are called indicators; in machine learning systems, they are called predictors, independent variables, or features. They are the data series that we hope contains signals that precede profitable trades.
Assume we are working with daily data, and have a data point for each day. We want to develop a classification model that separates the data points into two classes — whether to hold a long position or remain flat. And assume we have an ideal indicator that has a range of values from 0 to 100. We are looking for a functional relationship where the value of the indicator determines the value of the class. An ideal indicator gives a perfect separation and the task is easy.
Models — Introduction
The trading system development flowchart has two paths. One uses the traditional trading system development platform; the other machine learning.
Both paths begin with issue selection, data preparation, and transformations. Both lead through model fitting and validation, producing a set of trades that are the best estimate of future performance.
The upper path, indicator-based development, uses traditional trading system development platforms such as AmiBroker. The lower path uses machine learning.
The two have a fundamental difference. With traditional platforms, the technique begins with calculation of an indicator, then sees what happened after. With machine learning, the technique begins with identification of desirable trades, then sees what happened earlier. Both use models.
The workflow of the development phase using traditional trading system development platforms is:
- read data
- specify parameters
- compute indicators
- define rules
- generate signals
- produce trades
- evaluate performance
The goal of the model is to determine a set of rules that divide the data points into categories — those days when the rules tell the trader to have a long position and those days to remain flat. Importantly, the indicators are computed before the rules are evaluated and trades produced. In a word — compute indicators, then see what happened later.
Backtesting has fit a model to a set of data. Optimization has tried many combinations of rules and parameters. The results look good. The result at the top of the optimization report is best.
While there is always a best, we do not know whether this particular best is good enough. We need a technique and some metrics to estimate the risk and profit potential of trading the system. The validation phase will help us learn about the predictive ability of the trading system.
The scientific method requires that the system be tested using new data. Validation data. Out-of-sample data. Data that has not been used before. Time series data needs special attention because the order of the data points must be handled properly. Financial time series need special attention because profitable trading of a system removes some of the inefficiency that model was designed to identify. Over time, what were once profitable signals become more difficult to identify and less profitable to trade.
Machine Learning Models
We are performing supervised learning. Every data point has a target.
For the training phase, the target is known and the machine learning modeling libraries find the best fit of the equations that relate the predictor variables to the target variable. That model is saved. For the validation phase, a set of data that is both more recent than the training data and has not been previously used is passed to the model, the model returns the predicted value of the target, the predicted value is compared with the known value, giving a score for the accuracy of the model.
Using daily data and predicting whether tomorrow’s close will be higher than today’s close, we have a target for every day. The model will provide a prediction for every day. We compute results and mark-to-market every day. We have an opportunity to adjust our position every day. Rather than looking for impulse signals to buy and sell, we are looking for state signals to beLong or beFlat. Machine learning fits perfectly with the trading system development, trading management techniques, and metrics of safe-f and CAR25.
Iris Data in a Jupyter Notebook
The machine learning examples posted here use the Anaconda distribution of python version 3. That installation includes all the libraries that will be needed, as well as both the Spyder and Jupyter notebook environments.
These tutorials use Jupyter notebooks. Each notebook is converted to an html document that contains the Jupyter code and output. Links on these webpages open the html documents. You can copy the python code, paste it into your own development editor, and replicate the output on your own computer.
The first two tutorials use the Iris data.
Financial Data for Development
During development, you will be testing many alternative models. In the process, you will be making multiple data requests. You do not need current data, but you do need data that is consistent throughout the time period you are experimenting.
You need a stable set of data. One that has been inspected and corrected as necessary, and one you can count on to be consistent and does not change from run to run, even over a period of weeks.
You need freedom to make as many data requests as you need without worry about incurring high data cost, high transmission cost, or violating limits imposed by the vendor.
I recommend that your development database be local and stored on your computer.
In the early days of technical analysis — 1980s and early 90s — the markets were much less efficient than they are now. Historical data was scarce and expensive, bid-ask spreads were wide, commissions were high, holding periods were long, and portfolios were in vogue.
Richard Dennis and William Eckhardt taught neophytes to trade the Turtle method. Their rules fit on one page — buy breakouts to new highs and short breakouts to new lows. Chart readers watched moving averages and trend lines. These techniques, and others like them, identified profitable trading opportunities visually and with with simple formulas.