This month, we focus on the open source data center. From innovation at every physical layer of the data center coming out of Facebook’s Open Compute Project to the revolution in the way developers treat IT infrastructure that’s being driven by application containers, open source is changing the data center throughout the entire stack. This March, we zero in on some of those changes to get a better understanding of the pervasive open source data center.
Sometime in the early 2000s, Amir Michael responded to a Craigslist ad that was advertising a data center technician job at a company whose name was not mentioned. He applied, and the company turned out to be Google. After years of fixing and then designing servers for Google data centers, Michael joined Facebook, which was at the time just embarking on its journey of conversion from a web company that was running on off-the-shelf gear in colocation data centers to an all-custom hyperscale infrastructure.
He was one of the people that led those efforts at Facebook, designing servers, flying to Taiwan to negotiate with hardware manufacturers, doing everything to make sure the world’s largest social network didn’t overspend on infrastructure. He later co-founded the Open Compute Project, the Facebook-led effort to apply the ethos of open source software to hardware and data center design.
Today, he is the founder and CEO of Coolan, a startup whose software uses analytics to show companies how effective their choices of data center components are and helps them make more informed infrastructure buying and management decisions.
We caught up with Michael last week after his keynote at the Data Center World Global conference in Las Vegas to talk about the problems of adoption of OCP hardware and data center design principles by traditional enterprise IT shops, and about the project’s overall progress in light of Google recently becoming a member, making Amazon the last major US-based hyperscale data center operator that has yet to join.
Here’s the first of multiple parts of our interview with Michael.
Data Center Knowledge: There has been a lot of talk about the importance of OCP to the world of traditional enterprise IT, but we haven’t seen much adoption of OCP servers in that space besides a handful of large companies, such as Goldman Sachs or Fidelity Investments. Is OCP really a compelling story for the smaller enterprise IT team?
Amir Michael: The idea behind OCP is taking a lot of the best practices and pushing them into the rest of the data center market. When it comes to enterprise, that’s a challenge. A lot of them are still on standard solutions. The area of interest for OCP there is actually starting the conversations with them. If they are engaged – almost regardless of whether they’re buying OCP solutions or not – they’re going to start to ask the right questions of their vendors as well. Maybe the end result is that they end up buying OCP gear, which is great, but the important part is that they buy efficient gear. And it can be OCP gear, or maybe they go and ask their current vendors to go and build gear that has a lot of the same principles OCP has, and that’s a win as well.
This may be what ultimately pushes these best practices further into the enterprise space and further into the vendors where it’s not acceptable to build inefficient solutions anymore. People don’t want that.
OCP’s efforts to try an engage enterprises more through adoption of the motherboard [for standard 19-inch servers], making OCP systems that are easier for them to consume, I think it’s a great way of getting that conversation started. The systems don’t have all the same benefits as 100 percent pure OCP gear does – [the gear] that is powering Facebook’s or Microsoft’s data centers or whoever else – but I think that having that conversation piece at the table, whether or not they adopt it, is extremely important.
DCK: Even if forward-looking enterprise IT leaders that have the budget want to use OCP servers, it’s not as simple as ordering some from a vendor. What are the big barriers to OCP adoption in the enterprise?
AM: It comes from two sides. One is on the solution side, the vendors. The other one is on the consumer side, the end users. Can they adapt? Can they change organizationally? How are they structured? When you talk about IT, there’s a lot more legacy that they have to deal with, many more constraints. Legacy policies, legacy architectures, old code bases, old facilities, even simple things that we’ve come across, like doors not being tall enough in a data center to allow an OCP rack to roll through. It can be something as basic as that. Those things need to be addressed, and those are sort of one-offs. OCP can help by having a lot of great messaging out there that motivates a lot of these end users.
On the other side, the vendor side, a lot of support needs to come into play when you’re dealing with traditional IT. And it’s very basic things sometimes, like warranty support, replacements, having enough bandwidth and capacity to help these organizations deal with technical issues, which doesn’t quite exist from a lot of the OCP vendors today.
And it’s understandable. They’re dealing with much larger customer bases right now. If they want to continue to branch down to smaller deployments and to enterprises, it takes time to build those, and the [incumbent IT suppliers] today are great at doing that. They have larger organizations. Some of the OCP vendors will have to start looking more like the OEMs (Original Equipment Manufacturers, which is another way to refer to traditional IT suppliers) to move into there.
That’s a lot of what’s missing today, preventing OCP from being adopted downstream. Service providers, but also the manufacturers [should be] more open around providing BIOS updates, allowing people to post those, being more in the open source community and opening up more of those code bases, so people can start doing derivations of them and hosting their own updates on their websites. If it starts to look more like Linux, where you get the user base driving those developments, it’ll eventually take foothold. And it’s starting to – there’s OpenBMC now, there’s OpenBIOS – but it’ll take some time.
DCK: We’ve heard consistently that OCP servers themselves aren’t that easy to get if you’re not buying at the same volume as a Facebook or a Microsoft. Do you think that will change?
AM: If you’re a vendor [of OCP hardware], you’re looking for a big account initially. I’m going to do OCP, I’m going to invest a lot of time into it. Well, I want to sell a lot of servers too. At some point, the long tail is going to have to be taken over too, and that’s a different type of operation. They’re still getting their feet wet with some of the larger providers, and eventually someone’s going to need to make a commitment [to smaller users]. I believe there is desire to do that.
Buying OCP gear is tough. A lot of the supply chain is driven by large vendors who are making huge purchases, and a lot of the smaller guys are on the tail of that, taking off chunks of what’s being manufactured. It’s very capital-intensive to buy gear and have it ready and store it for when someone wants to come around and buy 100 servers and have them delivered in a reasonable timeframe. That vendor with a 100-server order is going to wait until Facebook places a big order and then will tack on 100 servers on top of that. A lot of these service providers run very lean, and they don’t like to carry inventory. It’s expensive. It sits there, and it amortizes, and it’s not good on their books. Someone with a lot of capital can come in and solve that problem. That doesn’t exist yet today.
DCK: After years of being one of the big hold-outs, Google finally joined OCP this month and donated their data center rack design with 48V DC power distribution. Do you think that’s an important development?
AM: Personally, it was fun to see. A lot of the same people I used to work with when I was [at Google] have joined back up with me again, and I get to work with them. They’re a very smart, bright team. They have a lot of innovations behind the curtain that they haven’t shared yet. I’m sure of that just by knowing the team. There’s all kinds of work I’m sure they’ve been doing over the last [several] years to optimize even further, and I think for them to start the conversation with the rest of the industry is extremely important.
The conversation will develop gradually over time. Right now they’re sharing a small slice, a small innovation that they’ve created in their infrastructure, and I think they’ll hopefully build an appreciation for the other operators that are there and realize that there’s other good learnings they can take from OCP, as well and start sharing more of what they’ve built. And I think that’s when it will be beneficial for them and beneficial for the rest of the community.
I think at this point a lot of the low-hanging fruit has been plucked by their competition as well, so the advantage they had by having more efficient infrastructure may not be as significant as it once was, and maybe they realized that. Maybe now we can share more for the greater good. The industry is more efficient, and everyone benefits at that point.
DCK: Microsoft joined two years ago, followed by Google this year. The company that seems to be a natural fit for OCP but hasn’t yet joined is Amazon. Do you think it’s important that they join too?
AM: [There is] still a black box around what [Amazon’s] infrastructure looks like, what they built, how efficient they are. Even myself, as a customer of Amazon, I’d like to know more around these systems that I’m deploying and how efficient they are. I’d like to make sure that I’m a good steward as well, and they don’t have that yet.
If you think about cloud and the competition there, it really is a race to the bottom, as far as costs go, and it will be cut-throat. At the end of the day, what will differentiate the different cloud offerings isn’t the level of efficiency of infrastructure, because they’re all going to bottom out somewhere similar, as far as cost goes. What will differentiate them is the level of service they provide, the number of applications they provide on top of their platform. That’ll draw users one way or another onto the different cloud platforms.
So, I think this focus on infrastructure cost is somewhat misleading now. They shouldn’t be focusing on it. They should be as efficient as possible – everyone else is as efficient as possible – and they should focus more on product features, not on just cutting cost of their own data centers.
There’s too much at stake today not to share that. The growth of the industry is too large. The amount of power being consumed by data centers is too large. And it goes beyond just having the most efficient infrastructure. You need to empower everyone to have that efficient infrastructure, so sharing that is important.
I do somewhat pick on [Amazon] for not being open about it, for still being of the mindset that it’s a differentiator for them. Google got beyond that. That was huge. They were probably one of the more secretive ones. It is something they (Amazon) need to improve upon. On the flipside, it’s [about being] open to listening too. You have to go both ways.