Data

The data we use is time series.  Each data point represents a single price or a bar of data.  The single price could be a quotation or a trade — a tick.  The bar could be of any length of time and has prices (usually for trades) for any or all of open, high, low, close, volume.  Importantly, the data is in time order when it arrives at your computer, and is processed in time order by your program. 

Data vendors are a transitory group of businesses.  Verify that the data vendor you plan to use is still in business, providing the service you want, with an acceptable API to your trading platform.  The chart compares several with whom I have had recent experience and recommend. 

Beware.  There are vendors whose data is low quality, whose delivery is unreliable, and who install applications of questionable safety on your computer. 

Type of Vendor

The original source of the data is the exchange on which it was traded.  The data is retrieved from the exchanges by data vendors, repackaged, and made available to you. 

Yahoo and Google are general purpose websites that host financial data as one of their services.  They display charts of historical data, post latest data during the trading day, and (sometimes) provide data that can be downloaded. 

Norgate, DTNIQ, and Quandl are in the business of supplying data.  They treat data and its delivery more seriously.  

Interactive Brokers (IB) is a brokerage.  While their business is brokerage, they are serious about providing high quality data to their brokerage customers.  They also provide “practice” data to anyone who wishes to use it.

Development versus Trading

During development, you will be using many data points, each one of which represents the price at some point in the past.  Your program accepts the data, transforms it into features and predictor variables per your code, adjusts and fits models to it, then validates that the model has found patterns that precede profitable trades.  You do not need the very latest data values during development.  All of these vendors supply historical data that can be used in the development phase.

During live trading, you will be using the same data series (the same tickers) that you used during development, but not necessarily covering as long a time period.  Your previously fitted model will process the newly received data and report the latest signals.  You do need the very latest data values during live trading.  DTNIQ and IB supply real-time data that can be used by your fitted model to generate trade signals.

Consistency of data is important.  Ideally, use the same data series from the same data vendor for trading as were used for development.  You might want to use lower cost historical data for early stages of development, then switch to the vendor you will use to feed your live trading system for a final validation before moving your system from development to trading.

End-of-day versus Intra-day

End-of-day data has one data point per trading day.  For most vendors and data series, the data point is the set of OHLCV values.   Traditional mutual funds have only one data point per day — the Close.  All of these vendors offer end-of-day data.  Data for the most recent trading day is available after it has been processed by the data vendor and intentionally delayed enough to separate EOD service from real-time service — typically 30 minutes to 2 hours after the close of trading. 

Intra-day data consists of several to many data points, each representing some period shorter than a full trading day.  Depending on the data vendor and issue, these might range between 1-second bars and 1-hour bars.  Most programs that support trading system development have utility programs that accept data at a fine resolution, of say 1 minute, and summarize it into longer bars, of say 1 hour — or even a full trading day.  Yahoo and Google have an uneven history of providing intra-day data.  Norgate provides several snapshots (for Australian markets only) at intervals throughout the trading day, delivered after a delay of about 30 minutes.  Quandl’s intra-day data is historical only.  DTNIQ and IB provide streaming intra-day data that can be fed directly into your system to signal live real-time trades.

Streaming versus Snapshot

Streaming data refers to a real-time stream of individual quotations as they occur tick-by-tick.  (Subscriptions for “delayed” real-time data are available from some data vendors.  Delayed real-time data is an oxymoron.)  Snapshot data may appear to be streaming, but data points are consolidated and individual ticks may be omitted.  

DTNIQ receives data feeds from the exchanges and clearing agencies, then feeds all ticks to its customers with as little delay as possible.  IB also receives data from the exchanges and clearing firms, but it consolidates data into very short snapshots (1 or 2 seconds) and distributes them without further delay.  This probably does not matter, unless your application uses “tick volume” as an indicator.

Local Storage versus Vendor Storage

Most of the vendors maintain the master files of the data on their site and distribute data to you from those as you request it on your schedule.  If you wish, you can create your own copies of those data files on your computers.  Any changes you make to your local copy will be overwritten when fresh data is distributed by the vendor.  Norgate is the only data vendor among these that maintains data on your computer.  When Norgate makes a data distribution, it is on their schedule.  They write data to your computer, then disconnect. 

Links

References and Tutorials

Interactive Brokers Tutorial Webinars

Recordings of webinars, including the Python API for IB

Sentdex (Harrison Kinsley)  www.pythonprogramming.net

Loading stock data into Python using Pandas.

QuantStart (Mike Halls-Moore) https://www.quantstart.com

Using DTN IQ with Python

AmiBroker

How to get quotes from various markets

Simulated Data

You will hear recommendations to use simulated data to develop your trading systems.  Simulated data has no value in training trading models.  The markets we trade are nearly efficient.  The signals we seek are weak and embedded in a very noisy background.  If we knew how to construct simulated data that would identify profitable trades in real data, we would already have all the information we need, and we could code that into our model directly. 

Next — Indicators

Back to System Development