Technical Details of Facebook Outage

Add Your Comments

Facebook was down for more than two hours Thursday afternoon, marking its longest outage in about four years. The Facebook Engineering blog has posted a detailed explanation of what happened.”The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition,” writes Facebook’s Robert Johnson. “An automated system for verifying configuration values ended up causing much more damage than it fixed.”

In short: A configuration change created a feedback loop that overwhelmed a database cluster. The only way to fix the problem was to take the whole cluster offline – which meant downtime for web site. Read the Engineering blog for more details.

About the Author

Rich Miller is the founder and editor-in-chief of Data Center Knowledge, and has been reporting on the data center sector since 2000. He has tracked the growing impact of high-density computing on the power and cooling of data centers, and the resulting push for improved energy efficiency in these facilities.

Add Your Comments

  • (will not be published)