-
Blackberry Network Recovering After Major Outage
December 23rd, 2009 : Rich MillerA major national network outage at Research in Motion (RIMM) left millions of Blackberry users without access to e-mail for at least eight hours, apparently due to “emergency maintenance” on the Blackberry network. The outage appears to have affected Blackberry service on all carriers in all regions, with downtime beginning in late afternoon Tuesday and extending into early Wednesday. As of 6 a.m. Wednesday morning, reports indicated that service was returning for many users.
The communications problems extended to RIM itself, which has been silent on company accounts on Twitter, which has evolved into one of the most effective channels of outage updates for many service providers. RIM appears to have confirmed the outage when contacted by major tech media, but not issued any direct updates for users. That silence left frustrated users scrambling for information about the service’s second significant outage in a week, following extended downtime on Dec. 17.
UPDATE: RIM now says the problems are software-related. “(The) root cause is currently under review, but based on preliminary analysis, it currently appears that the issue stemmed from a flaw in two recently released versions of BlackBerry Messenger (versions 5.0.0.55 and 5.0.0.56) that caused an unanticipated database issue within the BlackBerry infrastructure,” the company said. “RIM has taken corrective action to restore service.” The full statment is online at Crackberry.com.
In the meantime, RIM is advising users who downloaded or upgraded BlackBerry Messenger since December 14th to upgrade to a new version (5.0.0.57) which resolves the issue.
The outages highlight RIM’s ongoing effort to build additional data centers to add more capacity and redundancy to the Blackberry network, which has suffered a series of extended outages in recent years. Research in Motion said it added 4.4 million users to the Blackberry network in the third quarter of 2009, which helped the company post stronger than expected earnings.
Read More » -
Network Issue Cited in Rackspace Outage
December 18th, 2009 : Rich MillerRackspace says a network peering problem caused an outage this afternoon that affected its Cloud Sites cloud computing service. The incident resulted in downtime for some sites hosted in the company’s Dallas data center, which has experienced several outages this year due to power problems. But Rackspace said the problems originated outside the Dallas facility.
Rackspace said the incident began at 3:42 pm and the network was restored at 4:13 p.m. Discussion on networking groups suggested Rackspace may have experienced a “routing loop” in which packets continue to be routed in an endless circle, which can result from hardware failures or configuration problems.
UPDATE: “The issues resulted from a problem with a router used for peering and backbone connectivity located outside the data center at a peering facility, which handles approximately 20% of Rackspace’s Dallas traffic,” Rackspace said in an incident report on its blog. “The problems stemmed from a configuration and testing procedure made at our new Chicago data center, creating a routing loop between the Chicago and Dallas data centers. This activity was in final preparation for network integration between the Chicago and Dallas data centers. The network integration of the facilities was scheduled to take place during the monthly maintenance window outside normal business hours, and today’s incident occurred during final preparations.”
Read More » -
-
Major Data Center Outages of 2009
December 16th, 2009 : Rich Miller
It’s hard to say whether data center outages were more frequent in 2009 than in the past. But they were certainly more visible, as round-the-clock consumption of blogs and social networks made downtime harder to hide, and Twitter amplified customer complaints. First, let’s look at the outages – there were some doozies, and sometimes they came in bunches - and then review how social media altered the status quo for data center downtime in 2009.
Here are the top 10 data center outages of 2009, in no particular order:
Lengthy Outage at Fisher Plaza: The early July outage at this Seattle data center hub was widely felt, affecting e-commerce around the globe as Authorize.net went offline. The outage, which also affected availability for customers of Internap and AdHost and Microsoft’s Bing Travel site, was later blamed on an insulation failure in a bus duct.
Michael Jackson’s Death Slows The Web: On June 25 the Internet creaked under the weight of millions of users seeking news on the death of pop star Michael Jackson. As the news of Jackson’s death circulated, the traffic jam spread to many large news sites. An anlysis by Keynote systems later blamed some of the problems on slow-loading third-party contentlike ad networks and widgets.
The Sidekick Snafu: On Oct. 10 T-Mobile told all users of its popular Sidekick mobile that their data had been lost due to a server failure at Microsoft’s Danger subsidiary. Microsoft was eventually able to recover much of the endangered data, but not before a vigorous debate broke out about whether the Sidekick fiasco could be tied to the risks of cloud computing and online storage.
Total Data Loss for Ma.gnolia: Users who stored bookmarks online using the Ma.gnolia service were not as lucky. All of the site’s user data was irretrievably lost in the Jan. 30 database crash. The data disaster underscored the importance of sound backup practices, as well as the challenge of running a large service as a one-man operation.
Twitter Felled by DDoS: On Aug. 6 an electronic attack known as a distributed denial of service (DDoS) targeted sites on several major social networks. While Facebook and LiveJournals were slowed, Twitter crashed completely for about three hours before restoring service. The attacks continued for weeks as Twitter worked with its data center provider, NTT America, to strengthen its defenses.
Read More » -
‘100% Data Loss’ for Coding Horror Site
December 11th, 2009 : Rich MillerThe popular programmer blog Coding Horror has experienced a “100 percent data loss” and is seeking to restore its content from search engine caches. Site maintainer Jeff Atwood is blaming CrystalTech, saying the hosting provider was not properly handling the backup process for his virtual machines. But Atwood also acknowledges that he “absolutely should have done complete offsite backups … all my backups were unfortunately on the server itself, so save the lecture, you’re 100% absolutely right, but that doesn’t help me at the moment.”
Atwood believes he will be able to recover the text of his blog posts from web search caches, but says the same technique will not work for images, and is seeking help in finding a way to recover archived images from his site. The data loss also affected the blog for the Stack Overflow forum operated by Atwood and Joel Spolsky.
Read More » -
Brief Power Outage for Amazon Data Center
December 10th, 2009 : Rich MillerAmazon Web Services experienced an outage in one of the East Coast availability zones for its EC2 service early Wednesday due to power problems in a data center in northern Virginia. Failures in a power distribution unit (PDU) resulted in some servers in the data center losing power for about 45 minutes. It took several more hours to get customer instances back online, with all but a “small number” of instances restored within five hours.
“This incident impacted a subset of instances in a single Availability Zone,” said Amazon spokesperson kay Kinton. “Most of that subset of instances were back online in 45 minutes.”
The issues started at 4 am East Coast time Wednesday, and affected one of the three availability zones in Amazon’s East Coast operation. The zones are designed to provide redundancy for developers by allowing them to deploy apps across several zones.
Read More » -
Bing Busted, Briefly
December 3rd, 2009 : Rich MillerMicrosoft’s Bing search engine was offline Thursday evening, experiencing an outage of between 45 minutes and an hour. Outages for major search engines are relatively rare, and the downtime was quickly noted around Twitter and tech news sites. Microsoft’s Bing team acknowledged the outage, and later confirmed the site’s return to service. “Details to come once we have the full picture,” Microsoft said on its Twitter account for Bing.
UPDATE: Microsoft has provided a post-mortem on the outage. “The cause of the outage was a configuration change during some internal testing that had unfortunate and unintended consequences,” the Bing team wrote. “As soon as the issue was detected, the change was rolled back, which caused the site to return to normal behavior. Unfortunately the detection and rollback took about half an hour, and during that time users were unable to use bing.com.”
Read More » -
DreamHost Migration Snafu Causes Outages
December 2nd, 2009 : Rich MillerShared hosting provider DreamHost has managed a lot of data center migrations over the years as it switched among Los Angeles colocation providers, some of whom were acquired along the way. The fast-growing company has opted not to build its own data center, but recently decided to take a large equity investment in Alchemy Communications and move the rest of its gear into an Alchemy facility.
So now DreamHost owns a bigger chunk of the problem, as the data center migration went poorly, with network problems leaving many customers offline for days. Additional details are available at The WHIR and the DreamHost status page.
Read More » -
European Data Center Revenue May Double
November 30th, 2009 : John RathSeveral stories from recent weeks highlight the vibrant data center industry in Europe. Here’s a roundup:
European data centre revenue set to double
A report published by Tariff Consultancy Ltd notes that European data centre revenue is “set to more than double over the five year period from 2010 to 2015, with net raised floor space to increase by 70%, driven primarily by price increases.” The report gives pricing and forecasts for 19 of the EU25 countries and analyzes pricing of a standard 19″ rack, a small cage space and a 50 KVA suite of space for each of the countries. It also dives into trends impacting data centres such as raised floor capacity in markets, revenue per square meter forecasts, electricity pricing, pricing per rack and cage, and the most expensive data centre countries.Savvis received EuroFIT award
Financial technology publication Waters published their innagural EuroFIT awards earlier in the month, to recognize Europe’s hottest financial IT products and services. In the category of Best Datacenter Hosting Provider, Savvis (SVVS) took the award as a company capitalizing on the rising demand for data center services. Equinix was listed as an honorable mention in the category. A little over a year ago Savvis marked the completion of a global data center expansion by opening a 37,500 square foot facility on the outskirts of London in Slough. The award also noted that Savvis services seven of the top ten Fortune 500 financial services and banking firms. Amazon (AMZN) won the Best Cloud Provider award as an “overwhelming leader in the field.”The Bunker selected by Cimar
Read More »
The Bunker announced that it was selected by Cimar (UK) Limited to provide managed ultra secure hosting of its radiology image sharing web service. The Bunker delivered a scalable platform to Cimar built on Microsoft technology. Howard Jenkinson, managing director of Cimar said “absolute information security is a pre-requisite for any digital service carrying sensitive patient information.” Click here for a video of ‘The Bunker’ and details of a July 2009 130,000 square foot expansion. -
eBay Apologizes for Search Snafu
November 23rd, 2009 : Rich MillerWhen is your site up, but not really up? For online auction house eBay, that would be when your search function is busted. The search feature, which is a key to connecting the site’s millions of buyers and sellers, was down for much of Saturday, prompting an apology to users as eBay restored search functionality in phases. When shoppers searched for an item, the eBay site returned limited or no results.
“We are happy to report that critical search functionality was restored overnight on Saturday, and we are seeing normal activity levels today,” eBay’s Lorrie Norrington said Sunday in a statement. “As part of our effort to restore critical search functionality as quickly as possible for sellers and for buyers, we have kept some secondary search features temporarily offline. This includes refining search by certain item specifics, such as color or clothing size, and having Store Inventory Format results included in the main search results. We expect to bring these features online today as part of a phased approach to restore full functionality.”
Read More »
