It’s become an increasingly familiar story: A small ISP somewhere makes an error in its Internet routing announcements, triggering chaotic events like the IP “hijacking” of YouTube in 2008 and an incident in February that caused outages at major hosting providers.
It happened again last week, according to Renesys, which reports that an incorrect BGP announcement by a small ISP in Nagoya, Japan triggered a wave of errant updates. This ripple effect was related to a weakness in updated code for a version of Cisco’s IOS operating system for routers. Cisco promptly posted a security advisory and patch.
The incident prompted commentary from James Cowie of Renesys on the potential for future trouble due to software updates.
“The global mesh of BGP-speaking routers that we call the Internet has inherent vulnerabilities that stem from the software quality and policy weaknesses of its weakest participants, and the amplification potential of its best-connected participants,” Cowie wrote on the Renesys blog. “Running sloppy software at the edge of the routing mesh (in enterprises, say) is unlikely to give anyone the ability to propagate large amounts of instability or partition the Internet. But closer to the core, I think we have a serious problem to contemplate.
“Remember, if you can get just one provider to listen to you, and not filter your announcements, you can get your message into the ear of just about every BGP-speaking router on the planet within about thirty seconds,” Cowie continued. “And if some subpopulation of those routers can be reset, they act as amplifiers for your instability. Power law outage-size distributions are not a myth — they are a logical consequence of the structure of the Internet, the importance of a few key participants in carrying global traffic, and their reliance for interconnection on technologies that are clearly still in the shaking-out-the-obvious-bugs mode.”
The good news is that disruptive routing errors are noticed and resolved in fairly short order. But the “how could this happen” factor remains pretty high.