• The Web Creaks as Jackson Fans Mourn

    June 25th, 2009 : Rich Miller
    news-site-index-470

    A chart from Keynote Systems showing the increase in response time (blue line) and decline in availability (red line) for major news sites after the death of Michael Jackson earlier today.

    News sites and social media posting the earliest reports of the death of pop star Michael Jackson experienced availability problems as web users around the world sought to confirm the news and learn more. TechCrunch reports that TMZ.com, which was the first to report Jackson’s passing, was soon knocked offline. Twitter soon struggled to remain available as the volume of messages surged, and turned off search features in account profiles to help manage their server load.

    As the news of Jackson’s death circulated, the traffic jam spread to more large news sites. Keynote Systems reported this evening that its monitoring showed performance problems for the web sites of ABC, AOL, CBS,  MSNBC, NBC, SF Chronicle, and Yahoo! News. “Beginning at 5:30pm (EDT), the average speed for downloading news sites doubled from less than four seconds to almost nine seconds,” said Shawn White, Keynote’s director of external operations said. “During the same period, the average availability of sites on the index dropped from almost 100% to 86%. The index returned to normal by 9:15pm (EDT).”

    Read More »
  • Using Metrics to Vanquish the Fail Whale

    June 23rd, 2009 : Rich Miller
    John Adams of the Twitter ops team discusses the use of metrics to imprve web site performance at Velocity 2009 (Photo by Duncan Davidson via Flickr)

    John Adams of the Twitter ops team discusses the use of metrics to improve web site performance at Velocity 2009 (Photo by James Duncan Davidson via Flickr)

    Few prominent web sites have failed more often and under closer scrutiny than Twitter. But over the past year the microblogging service has rehabilitated its reputation, improving its uptime even as its traffic has grown phenomenally.

    That torrid growth continues, despite reports to the contrary based on ComScore data, according to John Adams of Twitter’s operations team, who spoke this morning at the O’Reilly Velocity Conference in San Jose. “There are a lot of reports that our growth is slowing down,” said Adams. “I can’t say what the real numbers are. But it’s just not slowing down at all. All that traffic has led to an insane amount of pain.”

    Measuring and analyzing performance data has been the primary weapon in Twitter’s ongoing effort to vanquish the “Fail Whale” - the downtime mascot that appears whenever Twitter is unavailable.

    “You really want to instrument everything you have,” Adams told an audience of 700 operations professionals. “The best thing you can do is have more information about your system. We’ve built a process around using these metrics to make decisions. We use science. The way we find the weakest point in our infrastructure is by collecting metrics and making graphs out of them.”

    Read More »
  • Last.fm Down, London Data Center Overheats

    May 31st, 2009 : Rich Miller

    The streaming music hub Last.fm was offline for about 6 hours Sunday after multiple chillers failed in its London data center, causing a dramatic rise in temperature inside the data center. The temperature in one row of racks reached 50 degrees C (122 degrees F), according to a chart posted by Last.fm of conditions at its Braham Street data center, which is operated by Level 3 Communications. UPDATE: The Last.fm site is back up as of about 8:30 pm Eastern time.

    Last.fm kept its users updated with a series of whimsical posts on its Twitter feed. “Crikey, one of our data centers has overheated! We’re fixing it as fast as we can, but the site will be down for a bit,” read one Tweet. “Apologies for the downtime, our datacenter appears to have landed on the sun,” read another. It’s not clear whether the thermal event affected other customers, but Level 3’s facility also hosts many telecom providers, hosting companies and enterprise firms that would be less amused by a lengthy outage.

    TechCrunch noted that Last.fm had recent touted its uptime on Twitter and Flickr. Murphy must have noticed.

    Read More »
  • Ripples Felt From Outage at The Planet

    May 13th, 2009 : Rich Miller

    The Planet experienced network problems Tuesday and Wednesday that caused brief downtime for many customers, including several large web hosting providers. Tuesday’s incident started at about 4:45 pm Central time and lasted about 30 minutes, and affected two of the company’s Houston data centers (H1 and H2) as well as customers in its newest Dallas facility (D6). 

    On Wednesday morning, the H1 and H2 data centers were offline from about 10:15 central time for about 25 minutes.

    While both incidents were relatively brief, they were widely felt. The Tuesday outage caused downtime for the customers of HostGator and Site5, two large hosting companies that lease servers from The Planet.

    During the downtime, the web sites and customer forums for both hosts were offline. The companies provided updates on their Twitter accounts (@site5 and  @hostgator), reinforcing the growing importance of the microblogging as a customer communication tool during hosting outages. The outage also disrupted service for the Tumblr microblogging service.

    Read More »
  • Weathering the Customer Service TweetStorm

    May 7th, 2009 : Rich Miller

    Web hosting outages are proving the power of Twitter as a real-time customer communications tool, with Los Angeles hosting provider Media Temple serving as the test case. When Media Temple experienced an extended outage in its grid hosting platform in early March, it was surprised to find that frustrated customers were seeking information on Twitter, rather than the company’s status blog or forums.

    “We always prided ourselves on being good communicators,” said Media Temple CEO Demian Sellfors. “But we weren’t ready in March. We had not yet gotten around to dealing with Twitter. It hit us like a ton of bricks. We needed to be in this channel, because this is where our customers are. We now have a full-time department that deals only with Twitter.”

    The company retooled its customer service operation, dedicating two full-time employees to monitoring Twitter and training eight other staffers to respond to customers via the microblogging service. That effort was tested this week, when Media Temple’s grid hosting service crashed again, and customers began Tweeting their pain.

    Scaling the Update Infrastructure
    MT’s team of “Twitterologists” has been manning the company Twitter account throughout, responding to complaints and directing customers to hourly updates on the system status blog, as well as the company’s promise of a one-year service credit to all customers whose sites were knocked offline by the incident.

    It’s too early to say whether the company’s Tweeting will salve customers’ anger over two lengthy outages. But Sellfors believes the improved Twitter response made a difference, and justified the company’s dedicated staffing. “Twitter is a fantastic platform for (responding to) incidents,” he said.

    Read More »
  • Server Problems Delay Nielsen TV Ratings

    May 7th, 2009 : Rich Miller

    Server problems at the Nielsen Company have caused delays in the TV ratings for network programs, causing consternation among network executives and media buyers that rely upon the numbers. Nielsen ratings data which is usually delivered the next day has been delayed three and four days, delaying key decisions about which network shows will get the axe because of poor ratings.

    Nielsen blamed a software bug from an unnamed vendor for the delays, which caused performance problems for the servers that support the data meters used to compile ratings for TV shows. MediaPost has additional details from Nielsen Executive Vice President-Global Business Services Mitchell Habib.  

    Read More »
  • Media Temple Issues 1-Year Outage Credit

    May 7th, 2009 : Rich Miller

    mt-234x60-ltMedia Temple is issuing a one year service credit to its grid hosting customers whose sites have been knocked offline by a lengthy outage this week, the company said yesterday. The problems occurred on the same cluster of MT’s Grid-Service that was affected by a previous outage in early March.

    Media Temple extended the service credit as it continued to migrate customers from a troubled storage section of its Grid-Service Cluster.02. Sites hosted on Cluster.02 went down Monday afternoon at about 4:15 p.m, with the Los Angeles hosting company citing file corruption issues in its storage system. The problems were similar to those that crashed Cluster.02 for nearly two days in early March.

    In an update at 4 a.m. Pacific today (Thursday), Media Temple said all but 200 customer sites were back online.The problems are only affecting customers of its grid hosting service, and not those using MT for shared hosting or dedicated servers (such as TechCrunch). But Media Temple CEO Demian Sellfors said the company recognized that it needed to address the impact on Grid-Service customers who have suffered through two bouts of extended downtime.

    “We’re offering a one-year service credit to any customer on Cluster.02 who has been meaningfully affected by the outage,” said Sellfors. “If you were burned by the Grid-Service platform, we want you to be compensated and feel good about it.”

    Read More »
  • Is the Customer Always Right? Not at Internap

    April 15th, 2009 : Rich Miller

    Data center providers are usually loathe to get in public disputes with customers, as it has a way of converting  them into former customers. There’s been a doozy of a public spat this week involving dueling finger-pointing between Internap Network Services (INAP) and Ooma Inc. At issue is a network outage on Monday that disrupted service for Ooma’s VoIP customers and prompted an unflattering writeup on TechCrunch,

    Ooma chief marketing officer Rich Buchanan used his Twitter feed to blame Internap for the problems.”Ooma issues were linked to an outage at Internap, Buchanan wrote. “It also affected RIM, Google, Yahoo, Blue Cross, TM, Verizon, and others.”

    When media reports began citing Internap, company spokeswoman Debra Forrester denied that Internap had any downtime, telling Betanews and the Business Times that the problems must be on Ooma’s network.

    Read More »
ARCHIVED ARTICLES

All Content on Data Center Knowledge
© 2009 Miller Webworks LLC
All Rights Reserved