The scientific community explores the world around it, whether large or small. From the cosmos and its origins to things smaller than the human eye can perceive, such as cells and the human genome. Both represent the ends of the spectrum of scientific inquiry, and now more than ever, both extremes (and many inquiries in between) require more and more scientific computing and storage.
This is a phenomenon which is driving IT facilities design shared by many academic institutions, according to James Cuff, Assistant Dean for Research Computing in the Faculty of Arts and Sciences at Harvard University. Cuff, who previously worked on the human genome project at Wellcome Trust Sanger Institute, will be one of the keynote speakers at the spring Data Center World Global Conference 2015 in Las Vegas, will discuss the current and future state of the data center, which is being called on to do more work and do it more efficiently.
There is a common connection to most data center managers’ pain points, in that not only are academics experiencing the strong uptick in demand for compute and storage, but so are enterprises, service providers, government organizations and the like.
“Data centers are the back bone of civilization,” Cuff said in a phone interview from his office in Cambridge. “Basic science is being done through computing. We have researchers who are modeling the early universe.” He added that a new microscope has come online that produces 3 terabytes of data per hour. (Talk about a storage challenge!) For Harvard, Cuff said, the scientific computing power over time went from core counts in the “hundreds” to more than 60,000 CPUs today. Their storage is about 15 petabytes currently.
Looking Ahead Leads to Collaboration
Seeing the ever-expanding need for compute and storage as well as the scientific inquiry drivers, Harvard, along with four other research institutions — Boston University, MIT, Northeastern University, and the University of Massachusetts — collaborated on a new data center facility in Western Massachusetts, titled, Massachusetts Green High Performance Computing Center. The facility, which is run by a non-profit owned by all the universities, is located where there is hydro-electric power. The data center was built to provide 10 megawatts, but so far has deployed 5 MW.
The key to consolidating disparate computing resources into a more efficient was to “build trust with the community,” Cuff said.
“It goes back to the old days. With main frames, the scientific community had to share,” he said. “Then the PC blew the doors off it. Until researchers found that one computer was not enough to get everything done. Then, they networked machines together.” The next tipping point has arrived, as the number of networked machines has grown significantly and requires specialized power, cooling and monitoring. This lead to the consolidation of computing resources, and sharing the computers again. The process included starting slowly with one area, Life Sciences, and “walking across the quad with computers under our arms at times,” Cuff said. There was also new equipment used as an enticement for researchers to move toward consolidation. Today, computing resources are outlined in the faculty’s offer of employment letter.
The ability to provide a new facility and make it more energy efficient was also very appealing to the group of institutions. “In Cambridge, energy comes from coal, oil, and non-renewable resources,” Cuff said. “In Holyoke, it was an old mill town with a massive dam.”
The other benefit to the site location was the connection to Route 90 (known as the Mass Pike) because it had a high-speed fiber-optic network running along it. “The different universities just use different wavelengths of light on the fiber,” Cuff explained.
Currently, Cuff uses three flavors of facility to meet the computing needs of the faculty and staff:
- Service Provider in Cambridge/Boston – high reliability site for “the crown jewels” (price of energy 15/16 cents per kilowatt-hour)
- MGHPCC – Holyoke – “cheap and cheerful” – less reliable (20 percent uninterruptable power supply) benefit of great amount of capacity and allow users access to full computer (price of energy 8/9 cents per kwh). Harvard has 22/24 rack pods with hot aisle containment.
- Public clouds such as AWS – instant, easy or temporary workloads
This set up allows the academic team to take advantage of some computing requirements being transient. Workloads can be sent to the location that suits them, and are deployed in the location and at the time when it is most cost efficient to run them. It should be noted this works in research because some jobs are not that time sensitive.
“We have tiered storage, including EMC gear and file systems that are built in house. So we get both vendor support and internal support on storage,” he said.
Monitoring, Power Management and Orchestration
“New chipsets are allowing for throttling of energy use during compute cycle, so a job that would run for two months, could be cut back to go for two months and a week. That would make an energy difference,” Cuff explained. He is now actively watching the power usage through rack-level monitoring. Previously, he was not as aware of energy usage. “Facilities paid the power bill. I was not incentivized to conserve. The CIOs should get the energy bills handed to them,” he said.
Currently, the use of the MGHPCC allows Harvard to set standardization on vendor platforms and management tools. “We use Puppet for orchestration layer,” Cuff said. “We’d be dead in the water without orchestration software. We have an army of machines that all look like their friends, and if there is one that is different we can identify it quickly.”
What Lies in the Future
“We used to say that we were growing by 200 kilowatts every six months,” he said. Now, he has a monthly meeting and he’s asked “how many racks” will be added in a given month.
“There’s a steep curve,” he said, “Once adoption happens, things start to pick up speed. We have now added the School of Engineering and the School of Public Health. In nine months, we will be looking at our next stage of design. The MGHPCC is about 50 percent occupied, I expect we will need about 40K more CPUs and the storage requirement is expected to grow as well.”
To hear more about the high performance computing facility that Harvard is using and more case studies of the science it supports, attend Cuff’s keynote session at spring Data Center World Global Conference in Las Vegas. Learn more and register at the Data Center World website.