Windows Azure Cloud Hit By Downtime

4 comments

Microsoft’s Windows Azure cloud service has been hit with a series of performance problems today, leaving customers unable to manage their applications for about 8 hours and knocking Azure-based services offline for some North American users.

Microsoft said the Azure service management problems were caused by a “a cert issue triggered on 2/29/2012″ – presumably a date-related glitch with a security certificate triggered by the onset of the Feb. 29th “Leap Day” which occurs once every four years. UPDATE: Microsoft has now confirmed this. “While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year,” Microsoft’s Bill Laing writes on the Windows Azure blog.

The Azure team deployed a software update to fix the problem, which was rolled out gradually. Microsoft said management functions were “restored for the majority of customers” by 1:30 pm GMT (8:30 am Eastern).

The Windows Azure Compute service began experiencing problems early this morning, several hours after the service management issues were seen.

“Incoming traffic may not go through for a subset of hosted services,” Microsoft said. “Deployed applications will continue to run … We are executing restoration steps to mitigate the issue.” Microsoft apologized for the inconvenience to users.

The outage is the latest in a series of cloud outages that are shaping how users approach resiliency of their cloud applications. In April 2011 Amazon Web Services experienced an extended outage that caused downtime or performance problems for many social media services that rely on the company’s cloud computing services.  In August the European cloud operations of both Microsoft and Amazon were knocked offline by a power outage in Dublin.

Perhaps the biggest impact of the outage has been seen in how existing users approach cloud architectures, according to Fellows. “End users now want to mandate that they have multi-cloud strategies,” said William Fellows, co-founder and Principal Analyst at The 451 Group, in a panel last fall discussing the outages.

About the Author

Rich Miller is the founder and editor-in-chief of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.

Add Your Comments

  • (will not be published)

4 Comments

  1. As with most public cloud systems, Azure also experienced its day in the dark. Having spent the last 3+ years in this ecosystem, I can say that most of Azure users are in the long tail. Startups ,upstart efforts from large enterprises etc and this outage went under the radar for the most part due to the long tail nature of the user base. As with most apps deploying to public clouds, a good monitoring and auto-healing system is critical to ensure timely notification and recovery. We have outlined that in our blog here http://www.opstera.com/blog/