Virginia IT Systems Lack Network Redundancy
The state of Virginia’s new state IT system is experiencing downtime in key state services because of a mind-boggling oversight: the state apparently neglected to include network redundancy as a requirement in a 10-year, $2.3 billion outsourcing deal won by Northrop Grumman.
Here’s what George Coulter encountered when he took over as the state’s chief information officer in August: “The first thing I noticed was that the network that Northrop Grumman rolled out didn’t have redundancy, backup,” Coulter told the Richmond Times-Dispatch. “The contract does not call for redundancy in carriers . . . in the network. Why that wasn’t put into the network, I don’t know. This is a service we have to have.”
The oversight is taking its toll. The Richmond paper reports that in just five weeks this fall, the Virginia DMV suffered 12 computer system outages, putting individual offices out of business for a total of more than 100 hours. “The problem of no-redundancy . . . accounts for 90 percent of our outages,” DMV chief information officer David Burhop told the Times-Dispatch.
The paper says the lack of redundancy also hampered communications services for the Virginia Department of Transportation when a state of emergency was declared during heavy rains and flooding from the Nov. 11 Northeaster.
Here’s another to this story. The state’s $2.3 billion outsourcing deal with Northrop Grumman was signed in 2005, shortly before the appointment of Aneesh Chopra as Virginia’s Secretary of Technology. In April of this year Chopra was named as the Obama Administration’s CTO for the federal government.
Since he arrived after the deal was signed, Chopra isn’t liable for the oversight in a network redundancy requirement for the outsourcing contract. But in nearly four years as the state’s Secretary of Technology, it appears Chopra either never realized or never addressed the scope of the problem.
The issue is relevant given the Obama administration’s ambition to shift much of the government’s IT operations to a cloud computing model. If problems emerge as federal apps move to the cloud, will Chopra and his team be able to identify and resolve them? This is especially critical with requirements for redundancy and uptime. Let’s hope the Obama team can improve on the experience in Virginia.
Coulter is calling for an emergency meeting of the state’s Information Technology Investment Board in the first week of December to deal with the lack of redundancy in the state’s IT system.
AustinPosted November 23rd, 2009
With the buying power in that contract, it should be easy to get two carriers and use lots of VPNs.
natePosted November 23rd, 2009
Multiple carriers of course often doesn’t give you complete redundancy. AT&T has a funny slide showing a cut trunk that carried something like 4 of the biggest carriers(including them), all in the same trunk, so all 4 were lost when the cable was cut.
And in my experience at least it can be pretty much impossible to determine how diverse the path is from the carriers. There are so many shared exchanges.
I bet most commercial data centers are the same, they may advertise multiple carriers and stuff, but I bet in most cases it’s not too far upstream from them that a single cable/structure can take out all of those carriers simultaneously to that facility.
But I suppose for a shop the size of the one mentioned in the article it could be one of those CYA situations.
I wonder if the Northrop people actually did the work and determined that multiple carriers wouldn’t buy them much so they didn’t implement them.
G-ManPosted November 23rd, 2009
These seems to happen most often when value engineering comes into play and the cost of the redundant circuits on a ring topology, turns into a lateral or linear topology. I would bet the state or NG struck the redundancy and felt good to save the money when someone with the State did not complain since the apps may have been resilient without them. Obviously it was not stress tested in 05…
PGTPosted November 24th, 2009
I think the author pulled out his ‘Jump to Conclusions’ mat to link VITA’s woes to Cloud Computing. Seriously?
As for redundancy, most carriers won’t go ‘open kimono’ on disclosure of actual routes for a circuit until after an outage. They see it as ‘proprietary’ info and only the most determined customer will get their due diligence honored. I’ve been on the carrier side of the table – a hosting customer wanted a redundant circuit and spent money for a second demarc in his building. He wanted a concatenated circuit from us as a backup to his primary bandwidth. I tried to get him the route maps but Legal refused. Fast forward a year and one of our techs was doing maintenance on a switch and unplugged the live side of his circuit (thinking he was working on the secondary path), taking the guy down.
He demanded (and got) his route map when threatening to sue for breach – turns out, the circuit was diverse except for one 1/4 mile section where both strands rode the same conduit to go under a bridge (the company was too cheap to install a new path). The shocker was that this collapsed section was from the Lucent 5ESS switch site and POP….every circuit coming out of that site was exposed should there be a failure on that short stretch of road.
My experience is everyone wants Tier IV/uber redundant until they see the price tag. Listen, this could be a number of issues and as for no redundancy in Northrup’s network, there has to be redundancy. Wherther or not configurations were correct, etc. – there are too many possible points of failure as others have pointed out. Bottom line is they will get it fixed and this will be a non-issue shortly…
The problem is not the lack of available redundancy, the problem is who is paying for the redundancy. Apparently the state never bothered to demand a readily available feature – there are plenty of telecom access points near this site and it is only a question of connecting to the points. That still costs money – maybe running new lines – and involves somebody thinking about it to begin with. Both parties could be at fault, but Northrup isn’t that dumb.. Northrup had an obligation to raise the issue at the design stage and most likely did. Virginia had an obligation to raise it at the proposal stage for all parties consideration. However, I’m inclined to agree with G-Man above, probably both parties knew, winked, and moved on.
As to carriers “open disclosue” of routes, this is the state not a commercial client. Somebody with the right hand should ask the left hand what is going on. The state has access to route maps that commercial clients don’t. I’ve seen them.
PGTPosted November 24th, 2009
the state has access to route maps? So what? Unless they have a map of the ‘as built’ for the solution they rely on, that means nothing. One has to investigate the CLR (circuit layout record) from A to Z points to verify the path a circuit takes. This is necessary even if you’re using two carriers – you want to make sure there’s no single path of failure. At the end of the day, you’re dealing with the company that owns/maintains the cable plant (Verizon most likely in VITA’s case) so any carrier providing a circuit relies on them for a least the ‘last mile’ unless they do their own build into the facility (like Cox or Cogent or similar business model).
PGTPosted November 24th, 2009
I should mention that I used to work for the carrier that had a VITA contract for network connectivity in Richmond. I believe they lost it a few years back, possibly with Northrop coming into the picture. I believe much of the IT staff we worked with at VITA got let go (due to the Northrop contract??), though my memory might be foggy on that as it wasn’t my account and it’s been several years now.
[...] Virginia IT Systems Lack Network Redundancy is a scary little number. According to the article, “…in just five weeks this fall, the Virginia DMV suffered 12 computer system outages, putting individual offices out of business for a total of more than 100 hours”. [...]