Tom Leyden is Director of Alliances and Marketing, Amplidata.
History is full of examples where people, empires and companies gained more power through their ability to better utilize information. Think about the invention of script, double bookkeeping, punch cards, or relational databases.
Today, we are witnessing the same sea change as Big Data imposes its consequences upon underlying data storage infrastructures, and gives rise to radically new data storage approaches. Object storage is one of them.
To understand the importance of object storage, and the benefits this new paradigm delivers over file system based storage, we need to dig into Big Data and try to capture the consequences of Big Data to storage infrastructures.
Big Data Turning Point
About a decade ago, we hit another data storage inflection point when relational databases proved inefficient for storing the massive volumes of small log files that were generated in research projects. The result is a range of new innovations such as Hadoop and MapReduce, and Big Data for analytics. We also refer to this as semi-structured data.
A similar problem now occurs with data sets made of larger, unstructured files, where analyst predict up to 90 percent of the massive data growth is expected for the next decade. Unstructured files are typically stored in file systems, but as with relational databases storing massive numbers of small log files, file systems are hitting their scalability limits as unstructured data sets grow. Most of the popular file systems do not scale well, and as contradictory as it may sound, folder structures are not optimal to keep your data organized.
Apps As File Systems
Indeed, applications are gradually taking over the role of the file systems, making the latter obsolete. Those applications often run far more efficiently when they talk to the storage directly, without a file system in between – by using REST, a standard interface that comes with Object Storage platforms.
An Object Storage platform architecture is actually very simple. In the back-end there is a large, uniformly scalable storage pool. On top of that there is a REST API and then there are the applications.
The trick is, of course, in building the storage pool, and how the object storage system approaches this. Scalability is just one requirement, but with growing data sets, efficiency is even more important and of course the data needs to be secure.
Quite a few object storage products on the market build their storage pool using traditional data protection schemes. Either data is protected by storing plain multiple copies, such as “3 copies in the cloud” — or the storage pool consists of a bunch of traditional RAID & replication systems. While both options have the same overhead problem – about 200 percent overhead –- one rack of data requires two additional racks for availability –- RAID-based systems have the additional issue that the storage pool is not a true single storage pool, but rather a collection of RAID systems. This is due to the fact that RAID relies on a fixed set of disk drives arranged in a RAID group, typically 6 to 8 drives, and not extending beyond 12 or 14 drives, due to the overhead of the read/modify/writes involved in a small I/O operations against large numbers of drives. Such infrastructures require a lot of management, and a lot of manual work when adding capacity.
A More Efficient Methodology
A much more efficient approach is to build storage pools based on an Erasure Coding approach. This technology was first used for communication with deep space, when messages had to be understandable even if parts were lost in the transfer. EC works like a Sudoku puzzle: blanks can be calculated based on the remaining information. Data Objects are split up in many small chunks, from which calculations are generated (with some small overhead). Those calculations are stored on the system, as widely spread as possible. When a disk (or a larger unit) fails, the data can still be read since they are calculations. Missing data can be re-generated.
Erasure coding has quite a few benefits when applied to object storage. EC enables to build systems that scale as one single system, one name space. A lot less overhead is required (just over 50 percent for ten 9’s of availability vs. 200 percent for five 9’s with RAID and replication) and well-designed EC-based Object Storage platforms will provide automated management, self-healing etc. Some vendors, like Amplidata, succeeded in maximizing the TCO saving by deploying EC-based Object Storage on very power efficient nodes. Their systems are up to 90 percent more power efficient and provide up to 70 percent savings on the total Total Cost of Ownership.
Object Storage And the Future
Will Object Storage kill the file system? Probably not. Tape is still around, and so are mainframes. Some legacy applications will always require file system access to the data, although file system gateways — such as Panzura — allow data center managers to retain a file system on top of an object storage back end – retaining the best of both worlds.
Yet, slowly but surely, organizations are finding that the reduced cost and management efficiency of object storage allows them to operate at greater competitiveness. Over time, greater efficiency always wins.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.