At NYSE, The Data Deluge Overwhelms Traditional Databases

Tom Groenfeldt
Apr 11, 2020

NYSE and NYSE Technologies, its technology subsidiary, found that the continuing growth of stock market data, the demand for more analytics, and, thanks to regulators, lots more reporting, were too much for its existing database.

NYSE Technologies receives four to five terabytes of a data a day and uses it to do complex analytics, market surveillance, capacity planning and monitoring.

The company had been using a traditional database, said Emile Werr, head of product development, NYSE Big Data Group and global head of EnterpriseData Architecture and Identity Access Management for NYSE Euronext . the existing system couldn’t handle the workload -- it took hours to load and had poor query speed.

NYSE turned to the IBM Netezza platform because it couldn’t accomplish its goals with traditional database technology, Werr said.

“We started five years  go and now we are more mature in the industry with using MPP (massively parallel processing) systems, and we have shown significant ROI, in being able to do complex analytics while managing the footprint,” said Werr.

“NYSE needs to store and analyze seven years of historical data and be able to search through approximately one terabyte of data per day, which amounts to hundreds in total,” added Werr. “The PureData System for Analytics powered by Netezza provides the scalability, simplicity and performance critical in being able to analyze our big data to deliver results eight hours faster than on the previous solution, which in our world is a game changer.”

NYSE’s initial focus was on trading surveillance of market makers and broker-dealers’ trading platforms. A second concern was capacity planning.

“The New York Stock Exchange SLAs (service level agreements) are stringent,” said Werr. “The system need to be 100 percent fault tolerant. When systems cross capacity thresholds, additional capacity would be automatically engaged and trading would continue to flow without interruptions.”

Werr said it became clear that traditional database technology would not do what NYSE needed.

“Extremely large data volumes, data integration complexities, market surveillance and  ad hoc analytics requirements took a large number of IT resources to babysit the environment and constantly tune it. The systems became too complex and slow,” Werr added.

To run analytics, data had to be extracted out of the database into applications like SAS and proprietary NYSE apps to perform necessary analysis.

Werr said NYSE Technologies has figured out how to use all its data assets in an efficient and cost-effective manner. The firm has extended its data warehouse with a distributed file source, he added.

“Big data for us is augmentation between systems like Netezza and a set of technologies like Hadoop and a distributed file system and identifier tiers that orchestrate data access. NYSE big data is all about taking that to the next level and packaging it so it can be dropped into an organization and leveraged so they could continue to support the innovations in big data.”

Phil Francisco, vice president of big data product management at IBM, said Werr had developed some interesting ways to load archival data into Netezza very quickly so NYSE can run surveillance analytics against records a few months back, or a few years back.

“Typically they will have less than a year’s worth of data in Netezza but they can always load data from an archive.” With the methods Werr developed, NYSE can look for long-running patterns. Emile was the architect for that -- how to use a high performance data warehouse around data retention.”

“NYSE continues to push the envelope for high performance, scalability and reliability,” Werr said. “NYSE has implemented large network pipes across data centers and trading systems.  We can move data around  very quickly.  Data needs to move in and out of analytics systems (like Netezza) fast.

NYSE Technologies makes its systems available for purchase and installation behind a firewall or as a service. The system is fast -- in terms of analytics; it is not designed for high frequency trading. It refreshes at one-minute intervals, near real-time in the analytics world.

Some broker-dealers ask for data at a specific point in time, such as the Flash Crash so they can test their algorithms against it. Moving that data to a firm can be expensive, so NYSE Technologies leaves it in their data center and firms can test against it without moving the day’s data.

“A lot of firms want to get data on demand while leaving it in our enterprise,” he explained. The data can be offered in raw form or customized to make it more user friendly.

“At NYSE, The Data Deluge Overwhelms Traditional Databases”
– Tom Groenfeldt twitter social icon Tweet

Share this article:


Post a comment
Log In to Comment

Related Stories

Nov 25, 2021

5 Tips To Ace Your Job Interview For A Data Scientist Opening

5 Tips To Ace Your Job Interview For A Data Scientist Opening.PNG 795.94 KBImage SourceAspiring data scientists have a bright future ahead of them....

Daniel Morales
By Daniel Morales
Nov 12, 2021

When to Avoid Deep Learning

IntroductionThis article is intended for data scientists who may consider using deep learning algorithms, and want to know more about the cons of i...

Matt Przybyla
By Matt Przybyla
Oct 16, 2021

6 Advanced Statistical Concepts in Data Science

The article contains some of the most commonly used advanced statistical concepts along with their Python implementation.In my previous articles Be...

Nagesh Singh Chauhan
By Nagesh Singh Chauhan

Join our private community in Slack

Keep up to date by participating in our global community of data scientists and AI enthusiasts. We discuss the latest developments in data science competitions, new techniques for solving complex challenges, AI and machine learning models, and much more!

We'll send you an invitational link to your email immediatly.
arrow-up icon