Twitter: We’re Not Sure What’s Wrong
Downtime happens. When outages occur, service providers usually move quickly to investigate and fix the problem, and then communicate with customers about what happened. The central thrust of the customer message is almost always focused on restoring confidence and trust: your service was interrupted, but we reacted quickly, have addressed the problem, learned from it and taken steps to ensure that it doesn’t happen again.
We’ve gone through our various databases, caches, web servers, daemons, and despite some increased traffic activity across the board, all systems are running nominally. The truth is we’re not sure what’s happening. It seems to be occurring in-between these parts. We’re busy working on instrumenting and adding meters to provide visibility into what’s slowing Twitter down. We’ll use this data both to alleviate the current woes and to help inform our long-term architecture work to make Twitter a utility service people can count on.
“We’re not sure what’s happening?” That can’t be good, but at least it’s honest. After months of “we’re about to fix things” posts, Twitter clearly has a credibility problem on performance issues. The good news? They’ve just raised $15 million in venture capital funding, which should buy some additional monitoring capabilities and allow them to add infrastructure – in case that turns out to be what’s busted.