The Salesforce.com Outage and Dashboards

5 comments

Is a network uptime dashboard useful if it goes down during an outage? That’s the question raised by Tuesday’s 40-minute outage at Salesforce.com, which left 900,000 subscribers without access to their applications. The downtime was attributed to a network device failing due to memory allocation errors.

“The failure caused it to stop passing data but did not properly trigger a graceful fail over to the redundant system as the memory allocation errors were present on the failover system as well,” Salesforce.com reported on its status dashboard. “This resulted in a full service failure for all instances. Salesforce.com had to initiate manual recovery steps to bring the service back up.”

The Register said the outage “exposed the dark side of cloud computing,” demonstrating the vulnerability of the cloud. Others took a more practical view of the issues raised by the downtime. 

“This event clearly shows us why hosting your own public health dashboard is a problem,” writes Lenny Rachitsky at Transparent Uptime. “The dashboard was down along with the site itself.”

Rachitsky is not without an interest in the topic, as he’s a senior engineer at WebMetrics, which provides third-party performance monitoring. But his blog provides some useful analysis of web site performance, including a list of public health dashboards and status sites for SaaS providers.  

Salesforce.com, for its part, says it will “continue to work with hardware vendors to fully detail the root cause and identify if further patching or fixes will be needed.”

About the Author

Rich Miller is the founder and editor at large of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.

Add Your Comments

  • (will not be published)

5 Comments

  1. Clearly if ONE network device brought down all of salesforce.com then by definition they are NOT using "cloud computing" technology, correct? If their systems were truly distributed a single device could not create an outage of any length of the entire cloud, only a portion. How then can this be the "dark side of cloud computing"? In reality it just sounds like another day in Web 1.0.

  2. Jon Q Progammer

    I'm a Jon Q public first year programmer. I wouldn't put a network monitoring system on the system it is monitoring. How it be able to monitor the system if the system goes down? It would also take away valuable system resources that would be better used on the applications. I just got out college and even I know this. Force.com is the "leader" in this market? Jeesh, they are hurting.