Numerous connectivity service providers had trouble maintaining services when global BGP routing table reached a critical threshold. (Photo by ServerCentral)

Numerous connectivity service providers had trouble maintaining services when global BGP routing table reached a critical threshold. (Photo by ServerCentral)

BGP Routing Table Size Limit Blamed for Tuesday’s Website Outages

2 comments

Many websites, including Data Center Knowledge, responded sporadically from certain locations Tuesday, but the outages did not result from loss of power at a hosting company’s or a cloud provider’s data center, a flood or a network cable severed by a squirrel. The problem was attributed to a structural problem in the way the Internet is built.

That issue is capacity of a certain type of memory chips on older-generation router hardware used in many service providers’ infrastructure. Ternary Content-Addressable Memory is memory routers use to store the Internet’s routing table. In very simple terms, it is sort of a combination of an address book and a map for routes Internet traffic travels on.

The amount of routes TCAMs can store is finite, as a post on The IPv4 Depletion Site blog, ran by a group of network and IT experts, explains. While workarounds have been developed to deal with this limit, not all routing equipment (especially older routing equipment) has been upgraded to use them. On Tuesday morning, the Internet felt a very distinct tremor that resulted from the size of the routing table reaching that magic number of 512,000 BGP routes. BGP is the protocol used to communicate routing information.

Representatives of the hosting company Liquid Web (which hosts Data Center Knowledge, among many others) indicated on the company’s Twitter feed that the issue had been attributed to the table size hitting the TCAM limit.

Since the issue affected numerous network operators, it was not easy to send traffic around affected areas of the Internet. “Generally, we would reroute traffic, but this is being hindered by the amount of providers experiencing outages,” the Liquid Web team tweeted.

According to downdetector.com, service providers that had network issues Tuesday morning included Comcast, Level 3, AT&T, Cogent, Verizon, Time Warner and possibly others. Outage start times, courtesy of downdetector:

  • Comcast is having issues since 8:30 AM EDT
  • Level 3 is having issues since 9:55 AM EDT
  • AT&T is having issues since 9:35 AM EDT
  • Cogent Communications is having issues since 10:10 AM EDT
  • Verizon Communications is having issues since 10:41 AM EDT
  • Time Warner Cable is having issues since 10:01 AM EDT

Things began looking up in the afternoon, when LiquidWeb tweeted, “As ISP’s have recovered from #512k active bgp routes being reached, many of our customers affected by these carrier issues have regained ability to reach their sites.”

The hosting company updated its Twitter feed around 3 pm Pacific, saying all of its customers had regained connectivity from all locations.

About the Author

San Francisco-based business and technology journalist. Editor in chief at Data Center Knowledge, covering the global data center industry.

Add Your Comments

  • (will not be published)

2 Comments

  1. jbizzle

    why not just all ISP's increase the minimum prefix length from /24 to /25?

  2. this looks strange..but the big ISPs don't have some sort of load balancing that can be implemented on the routers ? Please give your views. Thanks.