Are low-power Atom servers more efficient than cloud computing platforms in crunching large datasets? That's the case for eHarmony, which is now using servers from SeaMicro to power the data analysis for its singles matching service. The dating site had previously used the Amazon cloud computing platform to perform nightly data-crunching using Apache Hadoop.
eHarmony recently shifted its Hadoop processing to a cluster of SeaMicro SM10000 servers running in a colocation center. SeaMicro's hardware uses Intel’s low-power Atom chips, which have been widely used in mobile phones and laptops due to their energy efficiency. After extensive testing, eHarmony shifted its Hadoop jobs in-house, resulting in a "massive reduction in operational cost," the company said.
SeaMicro is one of several closely watched initiatives to adapt low-power processors in servers to help manage soaring power use in data centers. eHarmony was among the early users of SeaMicro's server, and will be featured in a case study of its implementation.
"Dramatic" Reduction in Expenses
"We were paying huge amounts for a few hours of compute in the cloud," says Cormac Twomey, director of software engineering at eHarmony. "We worked closely with SeaMicro to bring the SM10000 servers in-house, and since then we have enjoyed a dramatic reduction in operating expense and have seen a substantial reduction in variability around job completion times. We now have an additional 20 hours of compute per-day at our disposal."
eHarmony uses algorithms to analyze 29 different attributes of its member profiles and suggest matches. As its user base scaled to tens of millions of members, eHarmony turned to Hadoop, an open source technology that allows many small independent servers to work together. Hadoop enables applications to work with thousands of compute nodes and petabytes of data.
For several years, eHarmony ran its Hadoop operations in the cloud, which provided flexibility and scalability. But as its operations continued to grow, the company evaluated other options, including SeaMicro. eHarmony was able to purchase the SeaMicro SM10000 in a configuration that enabled its Hadoop application to complete its run in the same time four-hour time frame it had been taking in the cloud.
eHarmony said the switch reduced its reduced its operating expenses by "tens of thousands of dollars a month," and its total cost of ownership (TCO) by 74 percent.
Network Fabric Key to Design
SeaMicro's server architecture combines low-power CPUs, compact motherboards and an interconnection and switching fabric. This approach allows SeaMicro to pack 512 Atom CPUs into a 10U form factor chassis, which uses about 2.5 kilowatts of power. The network fabric that links the 512 CPUs in the SM10000 can support Ethernet, Fibre Channel and data center Ethernet.
SeaMicro gear isn't appropriate for every workload or application, but has performed well in delivering web traffic (including the Firefox 4 launch for Mozilla) and analytics.
"Hadoop is an ideal application for SeaMicro," says Anil Rao, vice president of product management, SeaMicro. "We have been able to show Hadoop users how to improve their time-to-job completion while reducing their power consumption, space consumption and slashing operating expense. It is very exciting to be able to demonstrate such significant savings for an industry leader like eHarmony."
eHarmony's Twomey said the shift to SeaMicro provided a number of operational advantages beyond the immediate cost and power savings.
One was predictability. In eHarmony's cloud environment, the amount of time it took Hadoop to complete the run varied, which the company attributed to the cloud, computing power and network bandwidth are shared among customers competing for resources. Over time, that made it increasingly difficult due to predict when the matching data could be delivered to other parts of the organization. Running SeaMicro in a dedicated environment allowed the operations team to meet its timetables and offer SLA’s to other units within the company.
The switch also eliminated data upload charges from cloud providers, and reduced data retrieval latency between its Hadoop and memcached applications by running both in the same SeaMicro system.
SeaMicro "delivered our Hadoop work more reliably, impressed our colleagues with an internal SLA, and had more compute time available to refine and improve our methodologies," said Twomey.