Admin Error Brings Down Joyent’s Ashburn Data Center

Outage, caused by 'fat-fingered admin' forces reboot of all compute nodes in provider’s US-East-1 region.

Yevgeniy Sverdlik

May 28, 2014

2 Min Read
Joyent conference schwag with CTO Bryan Cantrill’s name tag on display at the 2011 Node Knockout hackathon. (Source: Joyent’s Facebook profile)

Joyent, a San Francisco-based provider of high-performance cloud infrastructure services, saw one of its data centers go down Tuesday as a result of an error made by an administrator. The company had to reboot all servers in its US-East-1 data center, located in Ashburn, Virginia.

The provider has not released information on what exactly caused the outage, but is promising a “full postmortem.” In a forum post on Hacker News, Joyent CTO Bryan Cantrill wrote that the company would be providing the information “as soon as we reasonably can.”

Cloud outages sting more than others

Outages of service provider data centers cause a lot more damage than enterprise data center outages do because they host infrastructure for many companies instead of one. Cloud data center outages are especially painful because each physical server may be a host to multiple customers’ virtual compute nodes.

Another service provider, Internap, which offers cloud hosting services, experienced three outages at its New York City data centers during the past two weeks. The company did not say how many customers the outages affected overall, but at least 20 companies were affected by one of incidents.

Internap’s problems were caused by electrical equipment failure. This kind of an outage is different from Joyent’s. Internap’s outage happened at the facilities layer of the stack, while Joyent’s incident happened at the IT administration level.

‘Fat finger’ shouldn’t hurt so much

While human error was at fault, Joyent’s system ideally would have been built to withstand such errors. “While the immediate cause was operator error, there are broader systemic issues that allowed a fat finger to take down a data center,” Cantrill wrote, adding that the company would be improving software and operational procedures to prevent such incidents from happening in the future.

Joyent does not plan to discipline the administrator that made the error, Cantrill told The Register, explaining that the company was more interested in learning from the incident than punishing people.

Joyent provides public and private cloud infrastructure services for companies that need more computing horsepower than the mainstream Infrastructure-as-a-Service providers, such as Amazon Web Services, can offer.

In addition to the Ashburn data center, brought online in February 2012, its cloud infrastructure lives in data centers in San Francisco, Las Vegas and Amsterdam.

Subscribe to the Data Center Knowledge Newsletter
Get analysis and expert insight on the latest in data center business and technology delivered to your inbox daily.

You May Also Like