A flawed software upgrade is being blamed for Monday’s four-hour outage of the Blackberry e-mail network. A botched software upgrade was also cited in a lengthy April 2007 outage on the network operated by Research in Motion (RIMM). Late yesterday the company issued a statement about the cause of Monday’s incident:
RIM’s early investigation of the service interruption that occurred on Monday points to a problem with an internal data routing system within the BlackBerry service infrastructure that had been recently upgraded. The upgrade was part of RIM’s routine and ongoing efforts to increase overall capacity for longer term growth. RIM continuously increases the capacity of its infrastructure in advance of longer term demand. Similar upgrades have been successfully implemented in the past, but there appears to have been a problem with this specific upgrade that caused the intermittent service delays.
Research in Motion is known to have conducted maintenance over the Feb. 9-10 weekend, including upgrades to hardware components, databases and administrative networking systems.
In the wake of the outage, some analysts were recommending that clients not to rely solely on the Blackberry network. Many analysts are raising questions about RIM’s infrastructure design and apparent reliance on data centers in a single location – Waterloo, Ontario – to route all its North American mail.
“The failure raised questions as to whether RIM, which has chosen to keep its entire network infrastructure in-house and in relatively centralized form, can continue to scale to meet demand for BlackBerry services,” wrote InformationWeek.
The New York Times noted that RIM “has previously said that North American messages are handled by two operations located near its head office in Waterloo, Ontario. Several analysts speculated that because not all eight million North American BlackBerry users were affected on Monday, only one of those centers was plagued by software trouble.”
Telecommunications analyst Iain Grant told the Times that the latest outage “strongly suggest that R.I.M. needs additional operating centers to provide more backup and that its software testing systems are inadequate for the company’s operating scale.”