How Dropbox Stores Stuff for 200 Million Users

6 comments

Andrew Fong, Site Reliability Engineer, Dropbox (Photo by Colleen Miller.)

Andrew Fong, Site Reliability Engineer at Dropbox, discussed the company’s infrastructure last week at Velocity 2013 in New York. (Photo by Colleen Miller.)

NEW YORK - It took the online storage service Dropbox about four years to accumulate its first 100 million users. It took just 10 months to add the next 100 million.

“The growth has been really immense, especially recently,” said Andrew Fong, a Site Reliability Engineer at Dropbox. Fong and other members of the Dropbox team described the company’s infrastructure growth last week during a presentation at the O’Reilly Velocity 2013 conference.

The powerful growth of Dropbox has been driven by its ability to offer cloud file storage that can be easily synched between multiple devices, including desktop and mobile devices. Files stores in Dropbox can be easily shared, and are accessible through a web interface or mobile apps, with support for Windows, Mac, Linux, iPhone, iPad, Android and BlackBerry.

How does Dropbox store files for those 200 million users, who store 1 billion files every 24 hours? The company uses more than uses more than 10,000 physical servers to manage user content, along with Amazon Web Services. User metadata is stored in the company’s data centers, while the actual files reside on Amazon’s S3 storage service.  Dropbox also uses Amazon EC2 instances to help the data centers “talk” to its cloud storage.

“We think we have a formula that works for this hybrid architecture,” said Fong. “I believe that hybrid infrastructures are what you’re going to see going forward for large infrastructures.”

Changing Architecture

It’s always difficult to engineer for growth when you’re not sure what that growth curve will look like. But Fong urged web architects to assume that the earlier you confront the challenges of a hybrid approach, the better off you’ll be.

“Don’t think it’s easy to combine systems later,” he said. “It’s better to do the work up front. There’s a huge difference between dealing with Amazon and dealing with data centers. With a data center, we actually have to care about the underlying hardware, and we have a lot of it. We would have built differently if they’d understood that up front.”

Working with data centers and Amazon requires two completely different tool sets, Fong said. But Dropbox is taking steps to simplify across the two platforms. Rather than “baking” a standard virtual machine image for cross-platform use, the Dropbox team is using the Puppet configuration tool to unify its approach to installations, inventories and lifecycles.

What are the tough challenges for Dropbox? Ziga Mahkovec, a member of the Dropbox engineering team,  identified several areas:

  • Serving Photo Thumbnails – Dropbox needed a more efficient “pipeline” to convert newly uploaded photos into thumbnails. By batching http requests, Mahkovec said, the Dropbox team was able to improve performance by 40 percent. 
  • Desktop Client Throughput – This problem was addressed by uploading file chunks in parallel using multiple connections, leading to a 2.5x speedup across Dropbox, and a 4X improvement in Japan.
  • Storage Latency on Amazon S3 – “The problem with using a service like S3 is that we don’t have much control over performance,” said Mahkovec. “It’s a black box to us. So we have to do good monitoring.” In this case, Dropbox pooled its requests to avoid making new connections.

Mahkovec said the answers to these type of problems are not always intuitive.

“When running at scale, what might seem the obvious solution doesn’t necessarily apply,” he said. “When you can’t control all the variables, we run a lot of experiments and iterate.”

About the Author

Rich Miller is the founder and editor at large of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.

Add Your Comments

  • (will not be published)

6 Comments

  1. Bashir

    "There’s a huge difference between dealing with Amazon and dealing with data centers. With a data center, we actually have to care about the underlying hardware, and we have a lot of it. " That's basically common sense, and basic knowledge. If they didn't understand that beforehand, there is little to learn from these guys. Just because the company is successful, doesn't mean every department contributed to it.

  2. Dropbox is fulfilling the promise of the cloud. For users, “Dropbox is the first day of the rest of their life. Dropbox also has had to make sure that only stable code is pushed to its clients. After all, a corrupted directory and lost work is one of the worst things that can happen to a storage and syncing service. Thanks for the read.

  3. David

    DropBox is a good product that is missing feature for the Enterprise. Don't like the little disclaimer about hem owning your data, say what? You can steal my proprietary secrets, great. No thanks I heard about a start up called PanTerra Networks the product works and is designed for the Enterprise.

  4. @David Drop may have read your comment, because on February 20, 2014 (2 days after your comment), they have posted this "Terms of Service" (Effective on March 24, 2014): " Your Stuff & Your Permissions When you use our Services, you provide us with things like your files, content, email messages, contacts and so on ("Your Stuff"). Your Stuff is yours. These Terms don't give us any rights to Your Stuff except for the limited rights that enable us to offer the Services. We need your permission to do things like hosting Your Stuff, backing it up, and sharing it when you ask us to. Our Services also provide you with features like photo thumbnails, document previews, email organization, easy sorting, editing, sharing and searching. These and other features may require our systems to access, store and scan Your Stuff. You give us permission to do those things, and this permission extends to trusted third parties we work with. " I do not see where they own my data there or where they can steal my proprietary secrets that I might store there legally. Of course, if we do ignore the laws and their terms of service, they can technically do anything just like PanTerra can or any other companies that host your data.