Skip navigation

Facebook Builds Exabyte Data Centers for Cold Storage

Facebook has built a dedicated data center at its Prineville, Oregon campus that could house older photos in a separate "cold storage" system that dramatically slash the cost of storing and serving these files.

Jay Parikh, VP Infrastructure Engineering, Facebook, presents on Facebook's "cold storage" methodology, which the social media giant uses to store user photos. (Photo by Colleen Miller.)

What do you do with an exabyte of digital photos that are rarely accessed? That was the challenge facing Jay Parikh and the storage team at Facebook.

The answer? A dedicated data center at its Prineville, Oregon campus that could house older photos in a separate "cold storage" system that dramatically slash the cost of storing and serving these files. The facility has no generators or UPS systems, but can house up to an exabyte of data.

Facebook stores more than 240 billion photos, with users uploading an additional  350 million new photos every single day. To house those photos, Facebook's data center team deploys 7 petabytes of storage gear every month.

But not all of that photo data is created equal. An analysis of Facebook's traffic found that 82 percent of traffic was focused on just 8 percent of photos. "Big data needs to be dissected to understand access patterns," said Parikh, the Vice President of Infrastructure Engineering at Facebook.

Tiered Storage, With a Twist

The answer was a tiered storage solution that could meet the needs of Facebook's 1 billion users. Tiered storage is a strategy that organizes stored data into categories based on their priority - typically hot, warm and cold storage - and then assigns the data to different types of storage media to reduce costs. Rarely-used data is typically shifted to cheaper hardware or tape archives, a move that saves money but often with a tradeoff: those archives may not be available instantaneously. As an example, Amazon's new Glacier cold storage is cheap, but it takes 3 to 5 hours to retrieve files.

That wouldn't work for Facebook, whose users want to see their photos immediately. "We need to have cold storage, but a fast user experience," said Parikh, who discussed the project this week at the Open Compute Summit. "And we don’t want to use any more power than needed."

Facebook developed software that could categorize photos and shift them between the three storage tiers. The savings is captured through dedicated hardware that could store more photos and use less energy.

Last year Facebook built a 62,000 square foot data center on its Prineville campus to house its cold storage, which can house 500 racks that each hold 2 petabytes of data, for a total of 1 exabyte of cold storage. Similar facilities will be built at Facebook's data center campuses in North Carolina and Sweden, Parikh said.

The cold storage data center has no generators or uninterruptible power supply (UPS), with all redundancy handled at the software level. It also will use evaporative cooling systems, although on a smaller scale than the two-story free cooling system employed in the adjacent production data centers in Prineville, which use the entire second floor as a cooling plenum.

More Storage, Less Power

Most importantly, each rack uses just 2 kilowatts of power instead of the 8 kilowatts in a standard Facebook storage rack. But Parikh said it will be able to store 8 times the volume of data of standard racks.

How does it manage this? The hardware itself is not radically different, but uses a technology called shingled magnetic recording that features partial overlapping of contiguous tracks, thereby squeezing more data tracks per inch.

Parikh said the system is architected so that different “chunks” of image data don’t share same power supply or top-of-rack switch to avoid a single point of failure that would lose data. And if a user deletes a photo, it is deleted from cold storage as well.

Not many companies face storage challenges at the kind of scale seen at Facebook. But Parikh believes more companies will be confronting these massive storage issues.

“Our big data challenges that we face today will be your big data challenges tomorrow," he said. "We need to keep coming up with advanced solutions to our storage problems. The most important innovations are the problems people solve before the scale of the problem emerges. I believe big data is one of those problems. And we won’t keep up unless we work together.”

EDITOR'S NOTE: This story has been updated to correct the details of the cooling system for the cold storage facility.

TAGS: DevOps
Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish