Microsoft’s
Windows Azure cloud service has been hit with a series of performance problems today, leaving customers unable to manage their applications for about 8 hours and knocking Azure-based services offline for some North American users.
Microsoft said the Azure service management problems were caused by a “a cert issue triggered on 2/29/2012″ – presumably a date-related glitch with a security certificate triggered by the onset of the Feb. 29th “Leap Day” which occurs once every four years.
UPDATE: Microsoft has now confirmed this. “While final root cause analysis is in progress, this issue appears to be due to a time calculation that was incorrect for the leap year,” Microsoft’s Bill Laing writes on the
Windows Azure blog [1].
The Azure team deployed a software update to fix the problem, which was rolled out gradually. Microsoft said management functions were “restored for the majority of customers” by 1:30 pm GMT (8:30 am Eastern).
The Windows Azure Compute service began experiencing problems early this morning, several hours after the service management issues were seen.
“Incoming traffic may not go through for a subset of hosted services,” Microsoft said. “Deployed applications will continue to run … We are executing restoration steps to mitigate the issue.” Microsoft apologized for the inconvenience to users.
The outage is the latest in a series of cloud outages that are shaping how users approach resiliency of their cloud applications. In April 2011 Amazon Web Services experienced an
extended outage [2] that caused downtime or performance problems for many social media services that rely on the company’s cloud computing services. In August the European cloud operations of both Microsoft and Amazon were
knocked offlin [3]e by a power outage in Dublin.
Perhaps the biggest impact of the outage has been seen in how existing users approach cloud architectures, according to Fellows. “End users now want to mandate that they have multi-cloud strategies,” said William Fellows, co-founder and Principal Analyst at The 451 Group, in a panel last fall discussing the outages.
Rich Miller is the founder and editor-in-chief of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.
Article printed from Data Center Knowledge: http://www.datacenterknowledge.com
URL to article: http://www.datacenterknowledge.com/archives/2012/02/29/windows-azure-cloud-hit-by-downtime/
URLs in this post:
[1] Windows Azure blog: http://blogs.msdn.com/b/windowsazure/archive/2012/03/01/windows-azure-service-disruption-update.aspx
[2] extended outage: http://www.datacenterknowledge.com/archives/2011/04/21/major-amazon-outage-ripples-across-web/
[3] knocked offlin: http://www.datacenterknowledge.com/archives/2011/08/07/lightning-in-dublin-knocks-amazon-microsoft-data-centers-offline/
[4] Rich Miller: http://www.datacenterknowledge.com/archives/author/richm/
Click here to print.