Jerry Gentry is Vice President, IT Program Management at Nemertes Research
As a change of pace I've asked John Burke, Principal Research Analyst at Nemertes, to join us for a discussion about Big Data. Big Data (BD) is popular on all the social networks, forums and electronic media. As much as it is being discussed, I still think there is room to clarify what BD is all about and what we, as owners of infrastructure, need to be doing in preparation. John has had a focus on the BD topic and brings a perspective from discussions with enterprise IT executives.
Jerry: Thanks, John for taking the time to discuss Big Data and what it means to those of us in the data center world. I think the best way to get a handle on this is to ask some questions. So, let me fire away. There is a lot of talk in the press and forums about Big Data. From your work and discussions with enterprise representatives, can you distill a common definition?
John Burke: Big data is both about how much data there is and about how it is used. Big data means, having sudden increases in the amount of data you need to manage and use, increases of an order of magnitude or more. Often this demand comes from new parts of the business: security, or maintenance, or customer service. Often the expectation is to make use of the data differently than other things you keep around; for example, much of the data don't go through the usual burst of "hot" (frequent) use followed by gradual cooling into unused data ready for the archive. Instead it starts warm, with low usage, but continues that way as it is re-indexed and re-searched over and over for different purposes. Uses and meanings may not be immediately obvious.
Jerry: What types of enterprises will be the first to need a Big Data strategy?
John: That is increasingly hard to answer, as the possible sources of new data streams continue to multiply. Any size business might wind up with a big data problem related to its social media presence, or use of digital video. Large businesses are perhaps more likely than small ones to develop such needs, for now, as they typically have more customers, more physical locations, and more business lines hungry for the information hidden in the data.
Jerry: What are the basic elements of a Big Data strategy?
John: Every BD strategy has to include at least 3 components addressing ownership, location, and motion.
Who owns it? This is critical. Except for special cases (security logs or data center sensor data) the owner is almost never going to be IT. There has to be clear definition of who owns the data, and who is able to make the cost/benefit decisions governing retention and motion: when is it worth it (and where will the money come from) to beef up storage to retain more data or bandwidth to move more data? What standards are to be applied to deciding how long to retain data, and what should happen to it when it can come out of live storage: archived, or just deleted?
How it will be stored? This now includes options ranging from in-house SSD (usually not practical for high-volume data) to cloud-based near-line or archival storage. Has to address where it will live at first and if/when it should move from there to other storage. Lot's of folks collect data into a home and then leave it there until they decide they can let it go -- no further movements. If data is to be indexed and re-indexed as it is used for new purposes, leaving it in responsive on-line storage makes lots of sense. If it is to be used only until it can be condensed into aggregated/abstracted representations, many folks let it cycle out of storage, with or without an archival backup.
How does it get transported around the WAN? If it needs to traverse the WAN, it needs to be factored into QoS schemes, WAN optimizer prioritizations, and possible storage array replication schedules to make sure it gets where it needs to go when it needs to go there with as little impact as possible on other critical WAN traffic.
Jerry: What is the first step that an enterprise should take in assessing their need for a Big Data strategy?
John: Look closely at what data streams are growing faster than others, and at business line plans that might drive entirely new ones. Being in close touch with business plans is critical.
Jerry: I have to ask this one. Is Big Data truly something new, or is it an existing issue that has been renamed? In other words, is it real or is it hype?
John: It is real, and separate from the ongoing need to deal with growth in the amount of retained data, structured and unstructured, resulting from many factors ranging from digitization of business processes to virtualization in the data center. That unrelenting growth is slower, and familiar. Big data is an add on, amped up and dialed to 11.
Jerry: Thanks, John. This has been very informative.
To get more useful data center management strategies from Nemertes Research download the Q1 2012 Data Center Knowledge Guide to Enterprise Data Centers.