In “Star Trek,” Captain Kirk never called down to the engine room to ask Mr. Scott for an update on warp drive efficiency or check how much the Enterprise was spending on dilithium crystals. “Scotty!” Kirk barked “I need MORE POWER!”
In the data center, more powerful servers can solve many problems. But as the cloud builders assemble vast “Internet-scale” platforms, effective system design becomes more complex than just running all the servers at warp nine.
James Hamilton has written a series of articles arguing that cost-effective scaling for huge cloud platforms requires new thinking – especially when it comes to servers. The problem, Hamilton says, is that CPU bandwidth is increasing far faster than memory bandwidth, causing performance bottlenecks. Enhancing the memory subsystem would cost more and use more power.
Hamilton’s counter-intuitive solution? “Just run the CPU slower,” he writes. “Internet-scale workloads are partitioned over 10s to 1000s of servers. Running more slightly slower servers is an option if it produces better price performance.”
Hamilton, who recently moved to Amazon after many years with Microsoft, makes his case in three posts at his Perspectives blog:
- The Case for Low-Cost, Low-Power Servers
- Microslice Servers
- Low Power Amdahl Blades for Data Intensive Computing
“Performance is the wrong metric,” Hamilton writes. “Most servers are sold on the basis of performance but I’ve long argued that single dimensional metrics like raw performance are the wrong measure. What we need to optimize for is work done per dollar and work done per joule (a watt-second). In a partitioned workload running over many servers, we shouldn’t care about or optimize for single server performance. What’s relevant is work done/$ and work done/joule.”
This approach was discussed in a paper Hamilton presented earlier this month at The Conference on Innovative Data Systems Research (CIDR) based on tests with a prototype of Rackable’s Microslice Server Architecture, which found the design could be more cost effective than the servers currently powering major cloud computing server farms.
How will this approach translate from the engine room to the bridge? Can Capt. Kirk overcome his craving for more power? There are lessons to be learned from Mr. Scott, who was a master of managing information and expectations, as well as warp drives. How did the chief engineer pull off those miraculous repairs so quickly in the original “Star Trek” episodes? Scotty spills the beans in Star Trek III: The Search for Spock:
KIRK: Mr. Scott. Have you always multiplied your repair estimates by a factor of four?
SCOTTY: Certainly, sir. How else can I keep my reputation as a miracle worker?