BGP Routing Table Size Limit Blamed for Tuesday's Website Outages

Many websites, including Data Center Knowledge, responded sporadically from certain locations Tuesday, but the outages did not result from loss of power at a hosting company’s or a cloud provider’s data center, a flood or a network cable severed by a squirrel. The problem was attributed to a structural problem in the way the Internet is built.

That issue is capacity of a certain type of memory chips on older-generation router hardware used in many service providers’ infrastructure. Ternary Content-Addressable Memory is memory routers use to store the Internet’s routing table. In very simple terms, it is sort of a combination of an address book and a map for routes Internet traffic travels on.

The amount of routes TCAMs can store is finite, as a post on The IPv4 Depletion Site blog, ran by a group of network and IT experts, explains. While workarounds have been developed to deal with this limit, not all routing equipment (especially older routing equipment) has been upgraded to use them. On Tuesday morning, the Internet felt a very distinct tremor that resulted from the size of the routing table reaching that magic number of 512,000 BGP routes. BGP is the protocol used to communicate routing information.

Representatives of the hosting company Liquid Web (which hosts Data Center Knowledge, among many others) indicated on the company’s Twitter feed that the issue had been attributed to the table size hitting the TCAM limit.

Since the issue affected numerous network operators, it was not easy to send traffic around affected areas of the Internet. “Generally, we would reroute traffic, but this is being hindered by the amount of providers experiencing outages,” the Liquid Web team tweeted.

According to downdetector.com, service providers that had network issues Tuesday morning included Comcast, Level 3, AT&T, Cogent, Verizon, Time Warner and possibly others. Outage start times, courtesy of downdetector:

Comcast is having issues since 8:30 AM EDT
Level 3 is having issues since 9:55 AM EDT
AT&T is having issues since 9:35 AM EDT
Cogent Communications is having issues since 10:10 AM EDT
Verizon Communications is having issues since 10:41 AM EDT
Time Warner Cable is having issues since 10:01 AM EDT

Things began looking up in the afternoon, when LiquidWeb tweeted, “As ISP’s have recovered from #512k active bgp routes being reached, many of our customers affected by these carrier issues have regained ability to reach their sites.”

The hosting company updated its Twitter feed around 3 pm Pacific, saying all of its customers had regained connectivity from all locations.

Comments

Plain text