Real Innovation is in the Applications
Dr. Joe Polastre is co-founder and chief technology officer at Sentilla, a company that provides enterprise software for managing power and performance in the data center. Joe is an energy efficiency evangelist and defines the company’s technology and product strategy.
Over the past few years, the data center industry has gotten smarter about power and cooling. They’re adopting cold and hot aisle containment, fresh air cooling, water and air economizers, and bypass UPS systems. All of these are common sense techniques aimed at lowering the overhead of data center operations, and while most data centers had a PUE of more than 2.0 a few years ago, new and modernized data centers are now routinely in the 1.2 to 1.4 range.
What this means is that 80% of power is now being consumed by IT equipment, with the power and cooling infrastructure consuming the other 20%. Yet, we still keep talking about cooling innovations and continue to overlook some disturbing hardware and software trends. This is a classic case of the 80/20 rule: Why spend your time optimizing the 20%, when the 80% is consuming the power? This 80% is the IT load, responsible for performing the useful work for the business.
Why Cooling is No Longer Needed
There’s an important trend going on among server, storage and networking vendors: They are routinely exceeding ASHRAE recommended limits. Dell, HP, and Cisco produce servers warrantied to 95F, and SGI to 104F respectively. If you ask Intel or the server vendors, they will even begrudgingly give you new code for fan control that causes the fans to stay at lower RPMs when exposed to higher temperatures. Add to that the fact that racks are now grounded, so there’s little worry of static discharge and thus humidity isn’t as much of an issue. Therefore, data centers can run up to 104F with minimal humidity control, allowing cooling expenses to be significantly cut.
Let’s explore this idea with The Green Grid’s online free cooling calculator. Set the drybulb threshold at 40C/104F. Enter the zip code of the warmest sustained temperatures in North America: 92328, Death Valley, California. We find out there are 8,584 free-air cooling hours in Death Valley. That’s 357 days per year! Instead of building a cooling plant, move applications for the other 9 days each year.
Innovative Software Drives Data Center Efficiency
Now it is time to stop building applications like we did in the mainframe days, and instead build modern, modular services. There’s a shift now going on in how applications are written, deployed, and managed. Enterprise applications have typically been built with a different piece of the application residing on different systems — the database, web server, business intelligence platform, etc. This necessitated 2N redundancy with a copy of each major component running to support failover in the case of an issue. That’s the old way of building applications.
In the last 10 years, there’s been tremendous innovation in software that has been enabled by the emergence of commodity, inexpensive servers. Enormous mechanical and electrical innovation has delivered high quality, high performance, and low cost computing systems. And so, software developers started to take a different approach to building applications: Instead of worrying about the very expensive computing resource, treat servers as disposable and abundant. Expect that the hardware will fail, and re-architect to embrace the vast resources at your disposal. This philosophy is the core of what I consider to be “Cloud Computing”.
Google is the leading innovator when it comes to software development in this manner. MapReduce and follow-on Hadoop have dramatically changed the way that modern applications and services are developed and deployed. Instead of a “componentized” system, each part of the application can process the whole lot of data in parallel. If any single system fails, the performance of the application degrades, but the service keeps running. In this model, we only need N+1 redundancy, not 2N. Don’t worry about fixing the failed system — throw it out! Bring up a new instance of the application, the performance recovers, and most users aren’t even aware that anything has happened.
With innovative application architectures like this, services are now truly independent of where they run (of course, financial trading is a notable exception). Workloads can run where power is cheapest, cooling isn’t needed, and resources are available. Applications can quickly re-provision without the user even knowing. And enterprises can deliver monumental new online services at a fraction of the cost.
If the data center’s sole purpose is to deliver services to its users or business, then why do we keep rehashing age-old common-sense cooling strategies? We should be talking about the applications, because that’s where real innovation and efficiency lies. Make the applications more efficient, and the rest of the infrastructure will reap the rewards.
MapReduce and Hadoop aren’t the only innovative new approaches to building efficient applications. Google File System, Google App Engine, the suite of Amazon Web Services including SimpleDB and Facebook’s HipHop are just a small number of incredible technologies that are also revolutionizing the way applications are made. You can even trace the roots of this new paradigm back to the Network of Workstations project at Berkeley. If you want a peek into the cutting edge of the future, check out the RADLab, sponsored by none other than every major software company including Google, Microsoft, Oracle, SAP, Amazon, Facebook, eBay, and VMWare.
Industry Perspectives is a content channel at Data Center Knowledge highlighting thought leadership in the data center arena. See our guidelines and submission process for information on participating. View previously published Industry Perspectives in our Knowledge Library.
While improving applications is an important part of the overall strategy to improve the “Big Picture” of data center energy efficiency and computing productivity, it would seem some the “facts” that the article cites are erroneous or misleading.
The majority of data centers are not operating at or near a PUE of 1.2. Most are still at or near 2.0 and many are still at 2.5 or more (I am not suggesting this good, just more realistic). Certainly it does not reflect the suggestion that power and cooling use only 20% of the energy, “so why spend time to optimize it”. Even newer sites such a Netapp’s Raleigh NC data center which exceeds the ASHRAE 9.9 recommendation, and was awarded the first Energy Star certification, operates at a PUE of 1.35
Moreover statements like “Racks are now grounded, so there is little worry of static discharge…” seems to imply that racks were not grounded before????
And to suggest that you alter the fan controller parameters to keep the fans from speeding up at 104F will ensure that “Expect the hardware to fail” and it will be “thrown out”. without the proper airlfow, based on high fan speeds needed at 104 or even at 95F
NEBS equipment can operate at 104F and higher, but they have much higher fan airflow requirements to keep transferring the heat from the chips using more air since there is a lower Delta T at the heat sinks.
Clearly there are many ways to improve efficiency by broadening the environmental, envelop. However we still want to ensure that we are not operating a server at 104F so that we have a reasonable expectation that the equipment will operate with a “thermal buffer” zone and not shut down should there be the slightest increase in temp or momentary loss of cooling..
You suggest that the applications simply move (to an operational site) when the lack of free cooling days/temps cause the data center to shut down. Yahoo and Goggle type “free” public search applications can tolerate and do this, but most enterprise data centers still have many application that cannot.
Once (and if) ALL applications and ALL data storage can just be instantly be shifted, without losing data or dropping a client session your “follow the cool” may be technically feasible, but not necessarily preferable, despite that fact that Microsoft’s ads suggestions of “to the cloud’ seem to be the current Utopian computing ideal of the decade.
So I would respectfully suggest that until we do get typical data centers to run at a PUE of 1.2 or better, we should continue to try to improve the energy efficiency of our infrastructure systems.
Some of your points are valid, although we are seeing a trend of enterprise data centers running much below a PUE of 2.0. While a study from 2008 said the average was about 2.0-2.1, most are now operating in the 1.6-1.8 range and a large number are sub 1.4.
I think you’re missing the point though. The infrastructure is only there to support the applications. Why aren’t you pushing on your application vendors to create more efficient and sustainable software? Facebook reduce the number of servers necessary for their infrastructure by an estimated 30% just by using a simple compiler — HipHop. If you can cut that kind of waste out of the IT demand, you get a proportional amount you can cut out of the cooling and power distribution waste.
Maybe we can’t run at 104F, but we can run at 92-95F. Besides, how much of a thermal buffer do you think is necessary? Have you measured the rate of rise in your data center, or the server mortality rates? Could you even respond to a cooling outage to keep the temperature “within bounds” if you need to? The answer in most data centers is no.
While we’re not there yet on moving the applications, it is coming. Yes, applications today are poorly written and still ascribe to the mainframe methodology, but the movement to cloud provides an opportunity to re-architect applications to be more robust to failure. Even large enterprise software companies like SAP have begun this transition.
Infrastructure optimization is not the wave of the future. It is a band-aid on an old way of thinking about how to run IT services, and with the evolution of applications and their architectures combined with air-side economization and fresh air cooling, it is less and less important every day.
I am sorry that I sounded so overly critical. I do believe that your underlying message about architecting applications that are more location independent and can be inherently designed to operate “in the cloud” is a worthy goal. However, it has been my experience that this seamless ethereal software nirvana has yet to materialize in the typical enterprise data center. http://www.ctoedge.com/content/do-you-know-where-your-data-center
In point of fact, I consider myself somewhat of an energy efficiency evangelist, and have been writing, presenting and webcasting the concept of operating the data center beyond the original “cast in stone” 70F 50% RH or else, mantra for the last several years.
As far as thermal reserves go, there are many data centers that have large chilled water reserves (this is especially critical for bladeservers that run 20-25 KW per cabinet), as well as redundant chillers and independent redundant chiller loops. Moreover, even those that use DX CRACs for more traditional densities have N+1-5 since they are all independent units so there is no SPOF, just to avoid a catastrophic cooling scenario.
While I don’t believe the industry as whole is ready to jump to 90+ F in the cold aisle, the ASHRAE 9.9 “recommended envelop” of 80.6F has done a lot to make data center operators move the dial into the mid-upper 70s and not get fired.
The Energy Star for Data Centers program had over 100 data centers measure their PUE for 11-12 months to create the baseline for program and the majority of the sites had a preliminary PUE above 1.75 and an average of PUE 1.91 Moreover, some were at a PUE of 3-4. (note this was originally called EUE during the original testing phase) http://www.ctoedge.com/content/move-over-pue-here-comes-eue
So perhaps if you take the High Road (perfect efficient software) and I take the Low Road (improving the infrastructure), we can all come together in the creation of the utopian data center with PUE of 0.5 (because we are actually recovering and re-using the waste heat), as well as shifting the computing load wherever and whenever it is most energy efficient anywhere in the world on demand. Until then, let keeps working at trying to improve the majority of data centers that are at PUE of 1.5 and above.
Very illustrative discussion. I do believe we all are pushing innovation and power efficiency and hope someday in the near future Data Center Managers and IT staff will be aware of all this trends.