One runs a popular service on just 350 servers, while another likely has more than a million servers. The common denominator: major traffic. Executives from six of the web’s most popular properties – Google, Microsoft, Yahoo, Facebook, MySpace and LinkedIn – shared the stage at Structure 09 yesterday to discuss their infrastructure and innovations.
Managing a megasite requires plenty of hardware. But that’s not the secret sauce, according to Vijay Gill, the Senior Manager of Engineering and Architecture at Google (GOOG). “The key is not the data centers,” said Gill. “Those are just atoms. Any idiot can build atoms and have this vast infrastructure. How you optimize it – those are the hard parts. It takes an insane amount of will.”
The challenges faced by the six sites varied. “I’m taking a minimalist approach,” said Lloyd Taylor, the VP of Technical Operations for the LinkedIn social network. “How little infrastructure can we use to run this? The whole (LinkedIn) site runs on about 350 servers.” That’s due largely to the fact that much of content served by LinkedIn consists of profiles and discussion groups are heavy on text. “We’re not a media intensive site,” said Taylor.
Not so for Google, which operates the YouTube video portal. Google says YouTube users upload 10 hours of video content every minute. “We realized we couldn’t build the capacity as fast as we needed it,” said Gill, who said Google engineers developed a sophisticated distributed caching system that instantly determines a user’s location and and serves YouTube videos from a local cache. “You cannot outsource this,” said Gill. “We have to do this in-house because it’s our core competency.”
‘No Religion About It’
Yahoo takes a different approach to content delivery, according to VP of Global Networks Raj Patel. “We use a mix of approaches for CDN and caching,” said Patel. “It’s indispensable to what we do, but there is no religion about doing it ourselves. It’s based on the economics of what we do, as well as performance. There’s a very direct tie-in from performance to business revenue.”
Microsoft has invested heavily in building an Edge Computing Network, but continues to use major commercial contnet delivery providers, including Akamai Technologies (AKAM), Limelight Networks (LLNW) and Level 3 (LVLT). “The challenge we end up with is that we have all sorts of applications,” said Najam Ahmad, the General Manager of Global networking Services for Microsoft. “You can’t handle all these applications in the same way. We end up with a varied mix of our own capabilities and CDNs.”
Ahmad said that shifting applications to its in-house Edge Computing Network has produced an 80 percent performance improvement in some applications. “That’s why it’s a competency we need to have,” he said.
Human Error … Still
Gill said the toughest challenges are not related to hardware. “The major problems are human error and software error,” said Gill. “They always have been, and I believe they always will be.”
What’s on the wish list for the megasite minders? “If I could have one thing I don’t have right now, it would be mass-scale, super-fast storage,’ said Richard Buckingham, the VP of technical Operations at MySpace. Buckingham said MySpace has been testing flash storage technology from Fusion I/0. “It’s pretty ground-breaking and revolutionary,” he said.
Facebook VP of Technical Operations Jonathan Heiliger served as the moderator of the panel. In an earlier presentation, he noted the importance of investing infrastructure to drive site performance.
Google’s Gill agreed. “We have a saying: speed costs money; how fast do you want to go?” said Gill. “And we want to go very fast indeed.”