Sparse Data – The Next Big Thing is Small
Jerry Gentry is Vice President, IT Program Management at Nemertes Research
We are more than data center managers. Our role is to manage, protect, service and support the ever expanding information gathering and processing needs of the enterprise. Information used to come to us through computing infrastructure where data was input by data entry and customer service staff. Now data comes from anywhere. Any device capable of holding an IP address can become a source of data that must be captured, stored, analyzed, and archived.
We focus a lot on the systems we use to capture and store information. The majority of what we are capturing comes from enterprise applications through computation with business intent. There is a growing amount of data beginning to move through the front door of the data center that is unrelated to the core business or standard office processes. It is Sparse Data, a relatively little amount of information per device, but potentially thousands and thousands of devices being polled on a frequent rate. The Sparse Data will become an ever increasing percentage of your stored information.
What is different about Thin Data?
I use that term to describe data that is fundamentally state information from non-IT devices. We are seeing the early indicators that almost everything will have some form of addressable existence. Think about an office building that has simple sensors and threshold monitors built into the furniture and ancillary office equipment. It is simple data coming from many, many sources. Viewed individually it may not be interesting, but viewing all the sensors on the floor over time might show the impact of changing temperature in the space, or moving the coffee machine. You can look at the actual usage of fixtures like doors and lavatories. It seems like big brother, but there is a massive potential in inferential data.
Now extend that concept even further. Say that some form of a reproducing nano technology is embedded in plant seeds. Those nano agents could become part of the plant and relay state information as the plant grows. There wouldn’t be second guessing as to when to harvest or if the plants are in distress.
Sparse data comes from outside of IT and it will be the next challenge for the IT infrastructure. All those sensors are great ideas, but the data has to be processed, stored and communicated to be of any use and those support functions fall on IT managers who are probably not even in on the development discussions. Our service is, as often seems the case, just assumed to be there. So, it’s already time to start thinking ahead of the challenge.
One of the great challenges and opportunities in data management in the coming period will be centered around the breaking down of the silos in which information has historically been trapped within. If you cannot efficiently access data, its value is significantly compromised. I have seen research from well known organizations that suggest eye-popping efficiency improvements from faster access to date. Technologies and tools that bring together geographically distributed repositories of data, sparse, big or otherwise data are poised to help business radically extract value from the growing tide of data and will be amongst the most successful of companies moving forward.