Rackspace says a network peering problem caused an outage this afternoon that affected its Cloud Sites cloud computing service. The incident resulted in downtime for some sites hosted in the company's Dallas data center, which has experienced several outages this year due to power problems. But Rackspace said the problems originated outside the Dallas facility.
Rackspace said the incident began at 3:42 pm and the network was restored at 4:13 p.m. Discussion on networking groups suggested Rackspace may have experienced a "routing loop" in which packets continue to be routed in an endless circle, which can result from hardware failures or configuration problems.
UPDATE: "The issues resulted from a problem with a router used for peering and backbone connectivity located outside the data center at a peering facility, which handles approximately 20% of Rackspace’s Dallas traffic," Rackspace said in an incident report on its blog. "The problems stemmed from a configuration and testing procedure made at our new Chicago data center, creating a routing loop between the Chicago and Dallas data centers. This activity was in final preparation for network integration between the Chicago and Dallas data centers. The network integration of the facilities was scheduled to take place during the monthly maintenance window outside normal business hours, and today’s incident occurred during final preparations."
Peering allows two providers exchanging large volumes of traffic to save money by connecting directly, rather than routing traffic across their paid Internet connections. When a peering connection is interrupted, it can disrupt access to sites hosted on the peering partners' network.
This was the second outage for Cloud Sites the past six weeks, following a Nov. 3 outage when sections of its Dallas data center lost power during testing of power distribution units (PDUs) during scheduled maintenance. This time Rackspace said there were no power problems at its Dallas facility. The Dallas data center has experienced power problems before, including outages on June 29 and July 7.
Some high-profile customers were affected, including the TechCrunch blog, 37signals and Laughing Squid web hosting.