Whether you are doing Algo Trading or Agriculture, your business is becoming more and more reliant on data. And what is it about that data that is common? It’s Big. It’s Fast. It’s Flexible.
These three trends (i.e. big, fast, and flexible) in new data and analysis are stretching the database in ways we hadn't even conceived of in the past. To better articulate and think clearly about these trends, let’s analyze some examples so we may understand how these disruptors are impacting the data market. As well, we will look at an example of a fourth trend – how cloud delivery models impact data.
Agriculture – Example of Big Data
The first disruptive change we will talk about is Big Data.
This term has gotten a lot of coverage in the press, but in simple terms, big data is about finding better ways of storing and analyzing massive amounts of data. With the staggering quantities of data being produced today, the simple strategy of putting all of your data into a traditional data warehouse is prohibitively expensive and insufficient to the task. How do we store and use data to meet business needs? Big data solutions take large amounts of structured and unstructured data, run algorithms against datasets to refine data, and glean business-critical insight.
In agriculture the data patterns are changing radically. I bet you never thought about farming as a big data opportunity. Next generation farm equipment like combines and tillers are going to be able to take soil samples as they move along, perform analysis on those samples, and feed the results of the analysis back to the manufacturer for crunching on a macro scale. This will result in a better understanding of what is happening in that entire area and make it possible to adjust things like the amount or types of fertilizer and chemicals that should be applied. If the farm equipment manufacturers figure out how to harness all this information, this kind of big-picture analysis could change the commodity trading markets forever.
Algo Trading – Example of Fast Data
The second disruptor we are seeing is Fast Data. Fast data usage is changing. Whereas it used to be a notion of processing a stream of data serially as fast as you can, it is now about horizontally scaling that processing and delivering information from data in real-time. Traditional databases, and even specialty databases like tick databases have served us well in processing data quickly in the past. They will still continue to play a critical role in delivering data with low latency requirements.
Algo Trading is still about applying algorithmic models to analyze fast moving market data in near real time, but there are definite trends of moving to horizontally scalable, memory oriented, non-relational data technologies allowing users to optimize for speed and performance with which they can analyze the data that is coming at them. Technologies like GemFire that enable parallel real-time analysis of data as it becomes available in a horizontally scalable and fault tolerant way can provide dramatically faster understanding of market data and competitive advantage for the traders who use them.
Sentiment – Example of Flexible Data
The third disruptor we will talk about is Flexible Data. We use the term Flexible Data to describe how applications today are using new sources of data both structured and unstructured, and combining them to produce analytics that simply weren't possible before. This unstructured data is forcing us to think about storing data in new ways too. No longer is data simply stored in a relational table, but a host of new relational and non-relational data structures have emerged - many of which are open-source. Things like Object databases for semi-structured data, document oriented databases, and even graph databases to discover the relationships of things under analysis. When we think about how applications are being developed today, we see an increasing number of applications being built using these new technologies.
There are some trading firms who are using alternative data sets that were not used before to establish or enhance their trading strategies. For instance there is at least one fund that decides their strategy each day for what they believe will be the top ten winning stocks based on sentiment data they glean from Twitter. This is a case of unstructured data being applied to a business that relied almost entirely on structured data for years.
Big Data and an Example of the Cloud-Based Delivery Model
In addition to these three data-related changes, there is a fourth disruptor –the cloud delivery model itself. The whole notion of hybrid cloud or "cloud bursting" has major implications for the data management layer of cloud-based applications. You may be able to get 1000 nodes provisioned in a public cloud in minutes, but if you can't get the data needed for the computation there for hours, you have a problem. So, one emerging pattern is to trickle feed the data to the public cloud on just a few nodes of an elastic memory oriented, cloud ready data store as it is being generated. This way, when you are ready to burst into the public cloud to perform the computation, most of the data is already there.
There are, in fact, some public cloud implementations that have sprung up simply because they have a lot of useful public data. The NYSE cloud, for instance, has instant access to all of the reference and market data for all of the instruments traded on that exchange. Users who want to analyze their performance against NYSE’s historical data need only push an abstract view of their trading decisions to the public cloud so that they can learn what decisions worked well, what didn't, and what might have worked better based on an understanding of historical trends.
New Beginnings for Technology
It is not just the database that is being stretched though. Traditional batch analysis mechanisms are unable to deliver the value that more real-time analysis can. The notion of near-real-time, closed-loop analysis patterns is becoming a reality, and is delivering new business value sometimes even from old datasets. If you can analyze your big data to find the good patterns, and then codify rules to help you detect those patterns in real time, you can significantly change the value of your data.
One thing is clear. There is a lot more data out there than there used to be, and traditional methods of collection, storage and analysis are ill suited to handle it all. We live in a place in time where creativity and new science are once again able to make significant changes in what had become a pretty stodgy area of technology.
>> For more info on VMware’s data-related product offerings, see the following vFabric Product Pages: