Technical Details of Facebook Outage

Facebook was offline for more than two hours today after a configuration change created a feedback loop that overwhelmed a database cluster. The only way to fix the problem was to take the web site offline.

Facebook was down for more than two hours Thursday afternoon, marking its longest outage in about four years. The Facebook Engineering blog has posted a detailed explanation of what happened."The key flaw that caused this outage to be so severe was an unfortunate handling of an error condition," writes Facebook's Robert Johnson. "An automated system for verifying configuration values ended up causing much more damage than it fixed."

In short: A configuration change created a feedback loop that overwhelmed a database cluster. The only way to fix the problem was to take the whole cluster offline - which meant downtime for web site. Read the Engineering blog for more details.

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish