Most of Facebook’s daily active users (989 million out of 1.09 billion) use the social network on mobile devices. That’s as of the end of March, the most recent user statistics the company has made available.
This means Facebook’s software engineers have to ensure the application, which is in a constant state of change, works on thousands of kinds of smartphones, built by different manufacturers using different hardware components and running different versions of multiple operating systems. How do you test every single code change on such a maddening variety of devices?
The answer to that question sits inside the Facebook data center in Prineville, Oregon. It is a lab that consists of custom racks designed specifically to hold and test software on thousands of smartphones at a time. Facebook unveiled the lab today, and said it plans to open source rack designs and some of the software its engineers use to test their code.
Thousands of Code Changes Weekly
Facebook needs the lab because its developers change its software code thousands of times per week, and they want to know how every change will affect user experience on as many different devices as possible. Make one mistake and on particular phone can run out of battery, or even memory.
“Given the code intricacies of the Facebook app, we could inadvertently introduce regressions that take up more data, memory, or battery usage,” Antoine Reversat, a Facebook production engineer, wrote in a blog post.
The service that tests code changes on mobile devices is called CT-Scan, used in combination with Chef. Facebook developed CT-Scan last year, and its engineers used to run it on devices they had at their desks, but the team quickly found that this approach couldn’t scale, which is why there is now an entire lab in the Prineville data center, custom racks and all, dedicated just to this task.
“We needed to be able to run tests on more than 2,000 mobile devices to account for all the combinations of device hardware, operating systems, and network connections that people use to connect on Facebook,” Reversat wrote.
That number isn’t arbitrary. It was based on things like the number of commits per week and the number of iterations that had to be done during each test to get results that mattered statistically. The number of phones required is one of the big reasons this operation was moved to Prineville: a “slatwall” holding 240 phones (similar to store display) that was tried at one point would have to scale to nine rooms in Facebook’s headquarters in Menlo Park, California.
A closer look at Facebook's custom rack for testing its software on smartphones (Photo: Facebook)
The Challenge of Wi-Fi in the Data Center
Every rack holds 32 phones, including eight Mac Minis or four of Facebook’s custom OCP Leopard servers. The Minis oversee software tests on iPhones (four per Mini), while each Leopard server drives eight Android phones to install, test, and uninstall the software. The phones are controlled using custom Chef recipes, which the company is also planning to open source.
Designing these racks is a significantly different challenge than designing a typical data center rack, and that’s because of Wi-Fi. You have to be careful about Wi-Fi signals between 32 phones in one rack or between phones in different racks interfering with each other, so every rack, in addition to having its own wireless access point is designed as an Electromagnetic Isolation Chamber.
Engineers watch the way phones react to code changes during tests remotely, via cameras installed in the racks.
Phone Density Unsatisfactory
For the next iteration of the lab, Reversat and his team are looking to double each rack’s phone density, from 32 to 64 devices, and give engineers ways to test software with tools other than CT-Scan, since it doesn’t fit every use case. One of the reasons to open source the hardware design and the Chef recipes is to have engineers outside of Facebook potentially contribute their own ideas to improve the platform.