Financial Data for Development

During development, you will be testing many alternative models.  In the process, you will be making multiple data requests.  You do not need current data, but you do need data that is consistent throughout the time period you are experimenting. 

You need a stable set of data.  One that has been inspected and corrected as necessary, and one you can count on to be consistent and does not change from run to run, even over a period of weeks.

You need freedom to make as many data requests as you need without worry about incurring high data cost, high transmission cost, or violating limits imposed by the vendor.

I recommend that your development database be local and stored on your computer.

Think ahead to trading.  Ideally, the data series used for development and the data series used for live trading come from the same data source.  If you plan to download data after the close of trading, generate signals, and enter trades the next day, Norgate, Quandl, Google, and Yahoo will all work with downloads at your leisure — they all update their data well after the close of trading.  If you plan to gather the day’s data, generate signals, and place trades at the close or very shortly after, you will need to make some modifications.  Perhaps arrange to collect the data manually and update the database from entries made at a terminal. 

The links that follow are Jupyter notebooks that describe techniques to download data, store it on your local computer, and read it from local disc into your development programs.

For a given ticker, say AAPL, the data downloaded from Norgate, Quandl, Google, and Yahoo have significant differences.  The differences appear to be related to the method Yahoo uses to compute its “Adjusted Close” series.  Yahoo and Quandl data series are very close to each other over the 30 years of historical data I examined.  Yahoo for any given date, Yahoo “Adjusted Close” is consistently lower — nearly the same for recent data, 10% lower for data older than August 2012.  Norgate and Google data series are very close to each other. 

Surprising to me, although Quandl free data seems to be very close to Yahoo, Quandl was missing several days of data in 2017. 

Norgate data requires a subscription.  Quandl, Google, and Yahoo are free.  Of the four, Norgate is clearly higher quality.  Of the three free sources, Google is better than both Yahoo and Quandl.

Norgate data is delivered to your computer on a schedule and stored in a database maintained by Norgate.  The database is in MetaStock format.  Norgate provides a conversion program that you can use to convert from MetaStock to ASCII/csv.  I have encouraged the Norgate organization to offer alternatives including storage directly in csv, and for an API callable from Python that provides direct access to the MetaStock database.  Either would greatly ease use of Norgate data for use in python programs.  

Consistency in data may be worth as much as accuracy.  Whichever data source you decide to use for historical data, use that source for all of the data.  Importantly, do not mix quotes from different sources in the same data file.

    Download Data from Quandl

    Download Data from Yahoo

    Download Data From Google

    Read Data from Local Disc