This month, we focus on the open source data center. From innovation at every physical layer of the data center coming out of Facebook’s Open Compute Project to the revolution in the way developers treat IT infrastructure that’s being driven by application containers, open source is changing the data center throughout the entire stack. This March, we zero in on some of those changes to get a better understanding of the pervasive open source data center.
Here’s part two of our interview with Amir Michael, who spent most of the last decade designing servers for some of the world’s biggest data centers, first at Google and then at Facebook. He was one of the founders of the Open Compute Project, the Facebook-led open source hardware and data center design community.
Today, Michael is a co-founder and CEO of Coolan, a startup that aims to help data center operators make more informed decisions about buying hardware and make their data centers more efficient and resilient using Big Data analytics.
Data Center Knowledge: How did the idea to start Coolan come about?
Amir Michael: My team built large volumes of servers while at Facebook, hundreds of thousands of them. As we built them, we put them in the data center and then turned around and started working on the next generation of design and didn’t really look back to see how decisions we made during the design actually panned out operationally.
We made a decision to buy premium memory and paid more for that because we thought it wouldn’t fail. We made certain design decisions that we thought would make the system more or less reliable at a cost trade-off, but never actually went back and measured that.
And we’re always making decisions around what kinds of components or system to buy and trying to decide if we pay more for an enterprise type of component, or maybe we can do with a consumer type of component. New technology, especially new technology entering the data center, doesn’t have good information around reliability. You don’t have a track record around that.
When I was at Facebook, I started to look back and say, “Hey, so what were the operational costs of all these decisions we made?” And we didn’t have a lot of data. I started talking to peers in the industry and said, “Let’s compare notes. What does your failure rate look like compared to mine?” And there wasn’t a lot of information there, and a lot of the people in this industry aren’t’ actually measuring that.
The idea for Coolan is to create a platform that makes it very easy for people to share data about their operation, about failure rates, about quality of components, about errors that they’re generating, about the environments that their servers are running in, both utilization and also the physical environment around them, and make that as easy as possible to do so people can have this rich data set that we collect for them and analyze.
Once you have this large data set, not only are we measuring and benchmarking someone’s infrastructure, we can now allow them to compare themselves to their peers. Your failure rate is lower, and here’s why it is: because you’re running at optimal temperature, your firmware is the latest version, and it’s more stable. Now that we have this type of comparison, we add a whole new layer of transparency into the industry, where people are making decisions based on actual data, informed decisions, not trying to guess what component is right for them.
Once you have that, you’ll quickly understand which vendors are right for you, which ones are not right for you, and you’re making much more informed decisions about this large amount of capital you’re about to deploy.
It adds a whole new layer of transparency to the industry, which I desperately wanted when I was at Facebook. I wanted to know if I should go to vendor X or Y. I didn’t have information, and when you ask [vendors] about quality of the product, you didn’t get a good answer. They gave you some mathematical formula they used to calculate [Mean Time Between Failures], but it didn’t actually correlate to what was in the field.