GE has stood up a system it uses to analyze data generated by aircraft engines while flying. The company recently conducted a pilot run of the system and said it already helped some of its airline clients cut operational costs.
The system is based on the concept of “data lake,” cooked up by a software company GE owns a 10-percent stake in called Pivotal. The rest of the company, led by former VMware CEO Paul Maritz, belongs to the storage giant EMC and VMware, in which EMC owns a majority stake.
A data lake is a collection of disparate data sources – where data is stored in different formats – that can be analyzed by a single analytics engine. Pivotal and GE pitch it as a faster-performing and cheaper alternative to traditional enterprise data warehousing, where data has to be organized and converted to a uniform format before it can be analyzed.
Citing IDC, GE said it could take as much as 80 percent of project time to gather and prepare data for analysis using the conventional data warehousing approach.
The data lake concept, however, is problematic, according to a Gartner analyst. One fundamental issue is the assumption that anyone in an organization has the skills necessary for Big Data analytics, and the other issue is risk associated with security and access control when data is placed into a data lake without discerning what a particular piece of data is and who is authorized to access it.
“The fundamental issue with the data lake is that it makes certain assumptions about the users of information,” Nick Heudecker, research director at Gartner, said. “It assumes that users recognize or understand the contextual bias of how data is captured, that they know how to merge and reconcile different data sources without ‘a priori knowledge’ and that they understand the incomplete nature of data sets, regardless of structure.”
In GE’s case, user sophistication is not an issue since the vendor seems to be providing analytics as a service to its customers, doing the heavy lifting itself.
GE Aviation’s pilot project, which took place in 2013, collected data on 15,000 flights from 25 different airlines. Each flight generated about 14 gigabytes of metrics data.
The data lake approach enabled GE to integrate all that flight data and run analytics against the massive data set. The process produced measurable cost savings, such as one-percent reduction on the yearly fuel bill of GE customer AirAsia, according to the vendor.
GE shrunk the time it took to run analytics against the data set from months (which would be required to do the job using the data warehousing method) to days.
The data lake itself is built on technology by Pivotal and integrates with GE’s own software called Predix. The GE solution is a way to connect a massive amount of machines, people’s devices and analytics systems in a standard, secure way.
The software that did the actual analysis in the pilot project was GE’s Predictivity. The company expects to collect data from 10 million flights by 2015, a 1,500-terabyte data set Predictivity will get to crunch through.
David Joyce, president and CEO of GE Aviation, said, “Gathering and analyzing data to improve our customers’ operations is no longer a futuristic concept, but a real process underway today, and growing in magnitude.”