I always get a kick out of it when people refer to data centers as this big warehouse of "data storage" and nothing else. I suppose it helps some people relate to what is on all of those computers residing inside. As much as readers here know that there is much more to the data center than just the data — the data is what is being protected and provided for within. The amount of data stored within devices in a typical data center has certainly grown in accordance with Moore's Law, if not more so over the past decade or so.
Data is just the first step. With this massive amount of data stored in storage arrays in the data center, how do we take advantage of it and properly analyze it to extract valuable information, knowledge and wisdom? How do we get quality analytics out of the overwhelming quantity of data pouring in?
Analyzing and Interpreting Big Data
GigaOm interviewed Jeff Jonas, a Distinguished Engineer and chief scientist at IBM Entity Analytics Group. Jonas' company Systems Research & Development was acquired by IBM in 2005. He has an impressive history of helping businesses leverage their information assets, and led the design and development behind the casino card counting systems, taking in data from all sources including MIT grads featured in the book titled Bringing Down the House and later the movie, 21.
Jonas shares some amazing insights into the information that is being created within the enterprise, and world, and how to make sense of it, and notes "as computers are getting faster and the world is getting more sensors, the organizations have been getting dumber. The percentage of what is knowable is on a decline."
Also, in a recent meeting with the Sunlight Foundation, he presented about how to apply context accumulation process to automated computers system to streamline document review. He also compares the work he has done and seen in the casino industry to enabling the enterprise to move out of a batch processing system to more real-time analytics.
Big Data is Big Business
Om Malik also talks about the era of big data and recent acquisitions in the analytics and data warehousing industry. Data Center Knowledge reported on IBM's acquisition of Netezza for $1.7 billion in cash, just after Oracle announced a cloud-based Exadata Elastic Service at Oracle World this year.
Also this past summer, EMC acquired data warehousing company Greenplum. This past week, EMC introduced the new EMC Greenplum Data Computing Appliance, an integrated data warehouse system using the Greenplum massively parallel processing (MPP) architecture. Using the Greenplum database 4.0, the appliance delivers data loading performance of 10 terabytes an hour. Greenplum was the foundation of the new Data Computing Products division at EMC.
Sensor Networks are Huge Data Generators
Giga Om's Stacey Higginbotham posted about how sensor network data tops social network data. Stephen Brobst, CTO of Teradata, reinforces Jeff Jonas' ideas that enterprises need help to take the exponentially rising hill of data and figure out how to manage it, what to keep and how to mine it for useful information.
The size of data continues to grow, we are about to leave the exabyte age, entering into the zettabyte age. To demonstrate how the sensor data accumulates Brobst explains "a Boeing jet generates 10 terabytes of information per engine every 30 minutes of flight. So for a single six hour, cross-country flight from New York to Los Angeles on a twin-engine Boeing 737 - the plane used by many carriers on this route - the total amount of data generated would be a massive 240 terabytes of data."
Teradata's research also reinforces the growth of data. In a recently releasedstudy titled The State of Business Intelligence in Academia, they explore whether tomorrow's workers will be up to the task of transforming massive amounts of data into business-relevant competitive advantage. They reference an IDC prediction that by 2020 the amount of data generated each year will reach 35 zettabytes. I (personally) think this is too low of an estimate, given the data and rate of change we are currently seeing.
Data - Information - Knowledge - Decision Making
Many many years ago, I wanted to get into the field of knowledge management. I don't know if I could do much more than explain the concept at the time, but a favorite author of mine was (and is still now) Thomas Davenport. His recent collaboration, Analytics at Work: Smarter Decisions, Better Results, covers analytics as a powerful business tool for leveraging data in key business decisions and processes. Davenport was interviewed for an article in MIT Sloan Management Review this past summer, Are you Ready to Reeingineer Your Decision Making? In the article, he describes analytics as explanatory and predictive — why something happened and what might happen going forward.
Socialization of Data
Teradata has a site about the socialization of data - living in the age of "WOW." Integrating business intelligence and enterprise analytics platforms with social data on the consumer side brings about total enterprise awareness for fully informed business decision making. Going back to another point made in the Jeff Jonas interview, the extra layer of data added by an onslaught of social, geo-location data is making extrapolation even more challenging.
It can, of course, certainly add another layer of value to the data also, if processed properly. Look at the recent expansion of the duties of Google's Marissa Mayer, who is the vice president of search product and user experience. In addition to her current role, Google has placed her over the local service operations, managing geo/local products as well as a spot on their operating committee. As many others have pointed out, this underscores the importance that Google is placing (and perhaps will even more so in the near future) on localized data in their products.
After zettabyte comes yottabyte — so you can probably bet that the industry will keep building those data centers!