Oct 05, 2020
Over the last two to three years, we have had to become more aggressive in differentiating ourselves from some emerging players who have come into the market data space focused on selling financial data cheap. This is not a new phenomenon. We have seen it happen time and time again. We were once ourselves the newer and cheaper player in the field. After all, our first API sold for as little as $19.95/mo back in 2003.
To our defense, we were never cheap because we made it a mission to be cheap, we were cheap mostly because we had no clue about the value of our data and so we priced it that way. Maybe we were naive, but we were just tech guys trying to make market data easy to access. Clients derived cost benefits from our APIs because they saved time on integration and they eliminated a lot of legacy infrastructure like colocated feeds, dedicated servers, and so on. They still do this today. And that ease of use is what partially fueled the Fintech Revolution and the successes of clients like Robinhood, Personal Capital, Betterment, SOFI, and Stocktwits.
Fast forward and now there are newcomers that have copied our APIs and made it their mantra to sell data cheap. To be fair, they sometimes have a point that market data is too expensive. Traditional providers like Bloomberg and Refinitiv have not been shy trying to take advantage of their stronghold to charge unreasonable prices. At the same time, the quality of data they provide is often significantly higher than that of the newcomers, which in the best case comes from second-tier providers with lesser quality data or in the worst case, is illegitimately scraped from public websites or outright and shamelessly stolen from others.
So this raises the question of the value of quality market data, and how one can distinguish quality data from non-quality data.
Data is a tricky product to buy because it is intangible. You cannot feel and touch its quality the same way you can tell a real Louis Vuitton bag from pale imitation. You cannot take a quick glance and know. It requires use and extensive testing before you can detect the gaps and bad data points and experience the missing corporate actions that will cost you weeks of work and tens of thousands of dollars. It’s a bit like buying a lemon car. You could not tell it was a lemon from the outside. It’s not until you are stranded in the middle of nowhere having to pay hundreds of dollars for a tow truck and your vacation is ruined before you realize you have been had.
On top of that, non-professional data buyers, such as individual investors, tend to have a highly biased perception of the value of data. That perception is shaped by the fact that much data is available for free on Yahoo! Finance (this data is paid for via advertising instead of user fees and it is certainly not available for commercial use). The fact that data can be easily copied also contributes to that perception. Many inexperienced buyers have no idea what causes market data to be bad or what the effort is to keep it accurate. They see a premium historical data set here and a cheaper version there, and they cannot tell the difference so they go for the cheap one. Then, depending on their use case, they might continue limping along with the cheap data set, working hard around every data problem polluting their existence, or they may start swearing to all nordic gods that they will never be had again. That’s typically when they come back to us.
So what does that have anything to do with TSLA and AAPL?
As you remember, TSLA and AAPL. two of the most widely traded and eyeballed stocks in the industry, both split on the same day back on August 31st. The impact of a stock split on underlying market data is significant. The stock price is divided, the volumes are multiplied, all historical data for the security must be adjusted (end of day prices and intraday data as well), and so on. If you don’t do this right, it will show like a sore thumb on your historical charts. If this happens on a stock as significantly tracked as TSLA and AAPL, everyone will notice.
Given the importance of TSLA and AAPL to the retail industry, the ramifications of those splits were significant. There was unusual market activity that day and two of the most popular trading platforms (Robinhood and TD Ameritrade) went down for a while due to the impact of the split on their operations. Even Schwab and Vanguard were impacted as well. On the surface, those outages were not market data related. They may have been caused by unusual activity that day. And while it is understandable in a unique situation like this, it was still surprising to see such massive businesses stumble over what was a very predictable event.
But most interesting is the fact that at least two of the emerging market data players missed the splits altogether. As a result, clients who depended on them for accuracy could not. The impact on their business was probably significant. The question is how can a financial data provider really miss such wildly advertised splits? Most likely, those vendors relied on their upstream provider to get the data right. And those upstream providers missed it. But what it means is that those vendors did not have either manual or automated processes in place to verify that such impactful data points would not be missing. And if you miss out on two such significant market events, what else are you missing?
At Xignite, we knew it was coming. We worked through the weekend to make sure that we would not be missing those data points. We had all hands on deck on Monday morning to make sure that if any issue occurred as a result of this unusual event, we could address it quickly. I am sure that many other data vendors did the same. And that is probably what differentiates a premium data set from a cheap one.
To be fair, we have had our share of data quality problems in the past. And we occasionally still do to this day. Market data is complex. Corporate actions (like splits, dividends, mergers, and others) are one of the most complex aspects of market data. This is why after years of relying only on our data sources and automated process for accuracy, we have invested and built up a team whose sole purpose is data quality and preventing issues such as the missing TSLA and AAPL splits. Staffing a team like this is expensive and complicated. It requires an architecture that decouples data types (pricing, corporate actions) and automatically stitches historical data based on corporate actions. Some of the specific data quality processes we run include:
Doing so also means admitting that as tech people we cannot automate everything. It also requires some manual intervention, so if you only have a handful of employees, quality is almost impossible to achieve. But what it also means is that we truly take the quality of our data seriously. And this is why our data commands a premium price over the cheaper alternatives.
Free Stock Market APIs
7 Day Free Trial