Big Data, housed in new and disruptive technologies, is expected to account for more than 50 percent of the world's data in the next five years, according to a a new study. While it offers huge and untapped value, the inevitable result is stress and strain on the world's Interent infrastructure as companies seek to manage this explosion of information.
The new study, released jointly by Internet Research Group and Infineta Systems, a provider of WAN optimization systems, examines how big data is affecting enterprise WAN (Wide Area Network) throughout the country.
Big Data - which is defined as datasets whose size is beyond the ability of typical database software tools to capture, store, manage and analyze - is most often found in petabyte to exabyte size, and is unstructured, distributed and in flat schemas. As big data continues to grow, the industry anticipates both enormous change and untapped value for enterprises. According to Infineta's report, most companies will adopt key Big Data technologies in the next year to 12-18 months.
Challenging Network Capacity
All this data in need of capture, storage, processing and distribution has the potential to clog networks. About .5 Gbps of bandwidth is needed per petabye of Big Data under management by Hadoop, an open source platform for large-scale computing. The bandwidth demand can result in compromises in the latency, speed and reliability of the enterprise WAN.
Infineta is interested in this topic, as the privately-held company based in San Jose, California supplies products that support critical machine-scale workflows across the data center interconnect. However, the study findings highlight developing trends that are impacting the entire data center industry.
Key trends identified by Infineta include:
- Cheaper storage pricing. While traditional data storage runs $5 per gigabyte, for the same amount of storage using Hadoop, the cost goes to $.25 per gigabyte.
- Increased scalability. Hadoop enables companies to add additional storage for a fraction of the cost that was previously charged. The scalability of Hadoop could lead to more than 50 percent of the world's data stored in Hadoop environments within five years.
- Lack of analysis. Only one to five percent of data collected outside Big Data deployments is actually analyzed. There is value that is being missed by lack of analysis. McKinsey recently reported tha tif the healthcare industry analyzed 95 percent of their uncaptured data, that it would have an estimated annual value of $300 billion. Another example of lack of analysis is the oil industry where oil rigs generate 25k data points per sec, and the company uses five percent of that information.
The report finds that organizations are deploying Hadoop clusters as a centralized service offering so that individual divisions don’t have to build and run their own, and that “bigger is better” when it comes to processing batch workloads.
This set up leads to Big Traffic - data movement between clusters, within a data center and between data centers. Data movement includes but is not limited to replication and synchronization, which will become especially important as Hadoop becomes a significant factor in enterprise storage. Big Traffic data movement services support Big Data analytics, regulatory compliance requirements, high availability services and security services.