David Greenfield is Product Marketing Manager for Silver Peak, Inc.
Big data presents organizations with significant opportunities, but many small- to medium-sized enterprises (SMEs) will need to overcome significant technical and bureaucratic challenges if they are to leverage the technology. The volume and velocity in a big data effort (including big data analysis) requires IT to rethink how the company collects, analyzes and shares information. Implementation costs, particularly consulting costs, are significant and new expertise is needed to extract meaningful insights from the flood of data engulfing today’s business.
Big Data, Big Challenges
But make no mistake, while the highest profile big data cases have been the Facebooks of the world, companies of any size will gain from the technology as well as the analytics performed on big data. Gilt Groupe, a global fashion e-tailer, grew to a half-a-billion in sales in large part because it mined nearly five years worth of member data to develop targeted marketing campaigns. Every minute, Fab.com, a meeting place for designers and customers, combines data about a user’s purchase history, membership information and more to spot trends that drive business decisions.
Brick-and-mortar companies may face their own challenges, but lack of information is not one of them. Between the e-mail blasts, video feeds from security cameras, data from point-of-sale systems, reports from inventory systems and most organizations generate enough data to populate a big data database. Gathering that information into a single location will be an enormous challenge.
Shipping disks or tape to a central location for uploading into a big data database is not always feasible or desirable and moving so much information across the corporate network is often impossible. It’s not just the lack of bandwidth that’s the issue. Even when hundreds of megabits connect sites, the delay and quality of the network in high-speed networks, dramatically undermines actual throughput. A coast-to-coast, 100 Mbps connection, for example, will still be limited to just 5.24 Mbps per flow (assuming 100 ms latency and no packet loss). Should loss increase to just 0.1 percent, throughput drops to 3.69 Mbps per flow (See “Party Poopers and Speed Junkies” and calculate the throughput on your network with this calculator.)
Network limitations also pose challenges when accessing data. With most databases, users typically like to copy and work on the data on their local device, which again leads to replicating gigabytes of data across the network. Applying a similar practice to big data leads to soaring network costs, poor performance and user frustration. But, organizations cannot afford to restrict big data access to local users; limited employee access to and use of big data is a major reason for project failures.
Inflated network costs, though, are only one area impacting big data’s price tag. Software and storage costs may be relatively small when compared with traditional enterprise data warehouses in part because of the use of Hadoop and other open source software package and scale-out-storage, but those costs often do not factor industry and regulatory requirements for security, disaster recovery, and availability.
Also missing in most calculations are the personnel costs. Given the immaturity of today’s big data market, Gartner expects organizations can expect to spend about 20 to 25 times of the supply costs on consulting and integration services. (By contrast, in mature markets, such as business intelligence systems, Gartner expects consulting services to run about three times of supply revenue.) Ongoing personnel cost, though, will likely remain. Organizations will need train or hire personnel to analyze big data. The “data scientist,” a combination of business intelligence (BI) analyst and statistician, is the hot new title for someone who mines these data sets for the new insights that will automate and optimize business processes.
Data Acceleration and Cloud Help Big Data
Cloud computing is a perfect match for big data. Big data’s appetite for storage, computation, power, complex database infrastructure, and sophisticated data processing capabilities is well served by offerings, such as Amazon Web Services (AWS).
AWS provides unlimited Elastic Cloud Compute (EC2), Elastic block storage (EBS) and simple storage services (S3) with a low price. It offers DynamoDB, a highly available distributed database cluster, and Elastic MapReduce, a managed platform to support Hadoop-based analysis stack. These cost-effective resources and technology empower business to build their own analytics within Amazon to gain deeper and richer insights into almost everything.
But the challenge still remains - how to get the data into the cloud or the company’s data center. Data acceleration software solves that problem. By running as an instance on both ends of the line, data acceleration software can improve throughput by over 200x. Moving 100 GBytes of data, for example, can take just 6.2 minutes - not 22 hours. Data acceleration does this by optimizing protocols to correct for latency, de-duplicating data to maximize the use of bandwidth, and, in some cases, recovering lost packets on the fly without requiring retransmissions that undermine throughput. And since data acceleration software can be licensed by the hour, costs can be exceptionally low for use cases where large data volumes need to be moved one-time or infrequently.
Data acceleration software is a critical step to almost any realistic, large-scale big data deployment. Whether deployed in the cloud or the enterprise, shortening the time to aggregate the data dramatically improves the value organizations see out of their big data deployments.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.