Do Brief Outages Count? Google's SLA Says No
Is an outage of less than 10 minutes still an outage? Not according to the Service Level Agreement (SLA) for Google Apps, which includes a 99.9 percent uptime guarantee. Pingdom read the fine print in the SLA and found the following definition:
“Downtime Period” means, for a domain, a period of ten consecutive minutes of Downtime. Intermittent Downtime for a period of less than ten minutes will not be counted towards any Downtime Periods.”
This loophole has been used to posit unlikely worst-case scenarios in which Google could have repeated short outages and still honor its SLA guarantees (which in turn has prompted discussion over at TechCrunch). The larger issue is whether Google is defining outages in a way that waters down the uptime guarantee, serving to provide additional protection to the provider rather than the customer.
How does your SLA define an outage?
From Surpass Hosting’s SLA:
…”In the event that there is no web site availability, Surpass will credit the monthly service charge for the service as calculated below and as measured 24 hours a day in a calendar month. The maximum credit is not to exceed the monthly service charge for the affected month:
Web site availability credit
95% to 99.4% = 25%
90% to 94.9% = 50%
89.9% or below = 100%
In order for you to receive a credit on your account, you must request such credit within seven (7) business days after you experienced no web site availability so that we may check our stats and your stats. You must request credit by sending a request to our accounting division through our helpdesk at https://desk.surpasshosting.com. For security, the body of this message must contain your domain name, the dates and times of the unavailability of your web site, and such other customer identification requested by Surpass. Credits will usually be applied within sixty (60) days of your credit request. Credit to your account shall be your sole and exclusive remedy in the event that there is no web site availability.”
Raj HonnayaPosted December 27th, 2008
Working 24/7 to gain knowledge of what it really means to be down and what is an acceptable downtime window, at a in a cloud computing company, Uptime is defined as Service Availability from the Customer stand point. There are always two aspects of Availability and Uptime — one defined by or experienced by customers and other that is defined and experienced from the Operational stand point. A port check can render the site up for 5 9′s practically 24/7 but the reality would be there could be functionality within the site that could not have been up for 5 9′s.
The real life challenge is to meet the middle ground and meet or exceed expectations of Service Availability.
One can achieve that middle ground by breaking the service down to its atomic level and setting up monitoring on a component level. Every component then can be reviewed from all aspects whether we studying the hardware its plugged into to Versioning of the software code that defines the availability. Uptime and outages can then be dealt at a miniscual level rather than port level.
More of a three-dimensional approach to Outage window definition.