Object-Based Storage Cost-Effective for Unstructured Data

Instead of indexes, OBS uses metadata to aggregate objects into buckets (or other logical associations) which delivers more efficient capacity scaling that equates to virtually unlimited data at scale.

Erik Ottem is Director of Product Marketing, Data Center Systems, Western Digital

Editor's Note: In the first part of this two-part series, we explored how Object-Based Storage (OBS) cost-effectively delivers data at scale, which is replacing traditional file-based Network-Attached Storage (NAS) architectures in today’s data centers.  In this second part, we present the key features associated with OBS covering extreme scalability, advanced data availability and durability, and simplified data management.

Extreme Scalability

OBS platforms operate on a flat address space, and as such, massive scalability is achieved without the overhead associated with file system hierarchies, data look-ups, or a block reassembly.  With traditional file storage architectures, indexes enable scaling beyond a single folder, but as the number of files increase, the file hierarchy and associated overhead become cumbersome, limiting performance and scalability.  Instead of indexes, OBS uses metadata to aggregate objects into buckets (or other logical associations) which delivers more efficient capacity scaling that equates to virtually unlimited data at scale.

Advanced Data Availability

In traditional storage architectures, RAID (Redundant Array of Independent Disks) is a common approach to ensure that data is available and accurate when it is read.  Striping data across multiple drives will protect one or two of them from failing; however, once a failure occurs, performance drops dramatically during the rebuild operation and the likelihood of other group members failing increases as well.  RAID rebuild times can take hours, or even days, and may require an immediate replacement of a failed drive.  If an unrecoverable read error occurs during a rebuild, data will be permanently lost possibly placing business data and productivity at risk.

With OBS, data availability is achieved through advanced erasure coding – a technique that combines data with the parity information, divided into chunks, and distributed across the local storage pool.  Erasure coding best practices require that no single drive hold more than one chunk of an object, and a single node never hold more chunks than an object can afford to lose.  This approach ensures data availability even if multiple components fail since only a subset of the chunks are needed to rehydrate the data.  There is no rebuild time or degraded performance, and failed storage components do not need to be replaced at the time of the read error, but when it is convenient.  Rather than focus on hardware redundancy, OBS focuses on data redundancy.

An OBS system achieves data availability through geographically spreading across three locations, but unlike the triple mirroring data replication model, the total data is not replicated to each location.  Rather, only one-third of the object data is stored in each location.  This approach not only reduces network traffic, but maintaining this level of data availability only incurs about 67 percent of overhead, whereas triple mirroring requires replicating, storing, and managing 100 percent of the data at three locations.  The geo-spread model provides very high data accessibility and resiliency at a substantially lower cost in equipment and management than traditional triple mirror data replication.

Advanced Data Durability

Data durability refers to long-term data protection, so a media failure, such as bit rot, where a portion of the drive surface becomes unreadable and corrupts data, makes it impossible to retrieve data in its original unaltered form.  Protecting chunks as they lie dormant on disk is of paramount importance in enterprise storage.  Simply protecting against a complete hard drive failure (as with RAID) does not protect against the gradual failure of bits stored on magnetic media.

When combined with appropriate data scrubbing technology, OBS guards against bit failures, so if a given chunk were to become corrupt, a replacement chunk can be constructed from the parity information stored in the remaining chunks that constitute the object.  It isn’t necessary to rebuild or replace an entire drive, just the affected data.  The combination of erasure coding with data scrubbing technology achieves extreme durability.  Some systems achieve up to 17 nines of data durability, or in simpler terms, for every 1,000 trillion objects, only one would be unreadable.  This is why OBS is widely used in hyperscale data centers and cloud computing environments to meet the highest data durability requirements.

Simplified Data Management

Unlike hierarchical file storage used in NAS environments, OBS has a flat architecture known as a namespace that collects the objects to hold within the object store, even those objects that reside in disparate storage system hardware and locations.  The namespace provides an effective and cost-efficient way to manage multiple racks of storage within one entity, thus enabling a simplified, single management solution for all data.  Although geo-spreading distributes data across multiple storage systems in various locations, the actual operation is only performed once, and invisible to the end user.  A single namespace makes it is easier to manage one system spanning multiple locations than managing multiple sites individually.

Summary

When one looks at the exponential growth in data, one can easily see that the challenge of storing that data has become significant.  Object-Based Storage offers key benefits for today’s data centers as an alternative to traditional storage solutions.  Combined with the high-density, highly-distributed nature of OBS, data centers can cost effectively support data at scale at a lower capital and operational expense due to more efficient data protection and a simplified management structure when compared with traditional storage architectures.

Opinions expressed in the article above do not necessarily reflect the opinions of Data Center Knowledge and Penton.

Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish