Trade Quality — Metrics

During development of a trading system we fit the model to the in-sample training data. Parameters are adjusted, each unique set of parameters defining an associated model. We are searching for the model that best fits the data. Best is defined by a formula — an objective function computed using the metrics associated with the trades, such as percent of trades that are winners, ratio of average win to average loss, maximum drawdown, etc.

One of the general principles of modeling and forecasting is that the out-of-sample data that will later be processed by the fitted model is similar to the training data.  Specifically to technical analysis, profitable systems depend on the future resembling the past. That is, that when the model is applied to new data the trade results will have the same characteristics as those found in the training data during development. Before going further, we need to understand three important concepts:

  • distribution
  • population versus sample
  • stationarity

Distribution

The model processes the data, identifying patterns described by the rules of the model that precede profitable trades. While we cannot expect the same patterns and trades to occur in the same order in the future as they did in the training data, we rely on the distribution of signals and trades being the same in the future. We expect that there will be approximately the same number of trades per year, the same winning percentage, the same average gain, etc. We can visualize the distribution in several ways.  One is to sort the trades from worst to best and plot the percentage gain per trade, one bar per trade.

For the charts displayed in this posting, the trades are the out-of-sample trades for the period 1/1/2008 through 12/8/2017 produced by the “RSIExampleForBlog” system described in the post “Why use AmiBroker.”  The data series is XLF, using data supplied by Norgate Premium Data.  You can copy that code from the posting listing, paste it into your copy of AmiBroker, and run it yourself.  The system works well for many issues, including US major market ETFs such as SPY, and sector ETFs such as XLF.  

By defining equal-width bins, we can assign trades to the appropriate bin, count the number in each bin, and form a histogram. The histogram forms a probability mass function (pmf) or probability density function (pdf). The distinction between pmf and pdf depends on whether the data is discrete or continuous. Asuming it will cause no confusion, I will refer to them as probability density functions, or pdfs. Note that these are pdfs in the statistical sense, not to be confused with the pdf file format of documents.

In general, the form of the pdf for a set of trades depends on the rules controlling the trades — particularly the exits. If the trades are close-to-close percentage profit or loss, the pdf is similar to the common bell-shaped curve.  However familiar it looks, it is not safe to assume that the distribution follows the Normal statistical distribution — financial data and trade results seldom do.

Each pdf can be transformed into its associated cumulative distribution function, CDF, by summing the bin counts from lowest to highest, then normalizing so that the range is 0.0 to 1.0 — or 0% to 100%. It looks like this.  First as bars, then as a line.  It is the line format we will use most often.

As we progress to the analysis of risk and profit potential, we will be computing CDFs and making estimates from them. The math and interpretation are easier if we form the Inverse CDF. Imagine plotting the CDF on a sheet of paper, holding the CDF by its lower left and upper right corners, rotating front to back, and viewing the graph through the paper. It looks like this.

For the model to be useful in predicting the future, the distribution of trades in the future must be similar to the distribution found in the training and validation using historical data. There are statistical techniques for comparing distributions and estimating whether they are the same. We will be making those comparisons as we go along.

Population versus sample

As we use them, population and sample refer to data.  We can think of them as different views of a set of data — different in the amount of the data that is visible to us. Population is the entire set of data. Sample is a subset of the population — usually the specific subset that we are working with. Typically, the population is much larger than the sample and cannot be completely known.

We make estimates of characteristics of the population from measurements of the sample. For example, we might like to know the average height of 25 year old males in the US.  It is impractical to measure the height of every 25 year old male in the US — the population. But we can measure the height of a subset of that population that we hope will be representative — the sample. We infer the distribution of the height of the population from the distribution of the height of the sample. As you refer to statistical techniques, estimations of the population based on measurements of the sample are uncertain.  They depend on the techniques used in choosing the sample, and the size of the sample.

In some situations, particularly when models are created to describe a relationship rather than to predict, the population is completely known. An example is an analysis of who survived the sinking Titanic, a well-known data set used in illustrating machine learning. In these cases, there is no other data and the sample and population are the same. In trading, we are seeing only a portion of all trades that could result from our model. We are working with a sample and inferring characteristics of the population.

Given a population, many samples can be chosen from it. There will be variation among the samples.

The important point here is that whatever set of trades we observe from a system, that set is a sample, not the population.

Stationarity

Stationarity is a characteristic of a set or distribution, and it is with respect to some metric. A distribution is said to be stationary if similarly formed subsets have the same characteristic. Consider the price of stock, such as Apple, that has risen in price over time. The set of prices of Apple is not stationary with respect to the mean price over time. Financial data is typically not stationary with respect to price. If a trading system rule relied on price, it would fail, because the future distribution of price does not resemble the past distribution of price.

Price is not stationary, but that might not matter. Whatever signals we identify as preceding profitable trades must be stationary. That is the future that must resemble the past.

Applying techniques that assume the data is stationary when it is not will introduce an unfavorable bias.

I have posted a video to YouTube that might help further understand stationarity and its importance.

Choosing metrics

Putting these three concepts together, we can begin to choose metrics that are representative of the population, are stationary, and will be helpful in selecting good models.

Any single equity curve is formed from a sequence of trades in time order. We are processing price and volume data, using our rules to select the sample of trades associated with that price data, from which we will estimate characteristics of the population of trades, and estimate other possible sets and sequences of trades and equity curves. That is — we assume the future set of trades will be similar to the past set of trades since they are drawn from the same population. But we cannot assume the same sequence, and we must be prepared for some variation.

Given the set of trades resulting from the training data, they might have occurred in a different order and we cannot assume characteristics of the set of trades that depend on the order in which the trades occurred. While final equity of a set of trades is independent of order, drawdown does depend on order, so drawdown over any single observed equity curve (such as from a backtest) is not a good metric. However, we could use the set of trades from that test, draw many equally likely sequences of trades from it, compute the equity curve of each, measure the drawdown of each, and compute metrics of the distribution of drawdowns. Whether that is possible or not depends on the capabilities of the development platform. For most traditional platforms it is not easily accomplished; for Python it is easy.

We have already seen that trade-by-trade position size belongs in the trading management system rather than the trading system. Metrics will be skewed if the trading system model has a position size parameter, so we will work with equal-sized trades. The percentage gained or lost per trade works well for shares; dollars gained or lost per contract works well for futures.

Some metrics that are useful and independent of sequence include:

  • number of trades 
  • accuracy — percentage of trades that are winners
  • percentage gained or lost
  • ratio of average win to average loss
  • holding period
  • maximum adverse excursion

Next — safe-f and CAR25

Back to System Development

 

Leave a Comment