Social Security Works to Avert Data Center Failure
Nearly two years after $500 million in stimulus funding was earmarked to build a new data center for the Social Security Administration, the project is already a year behind schedule and won’t be operational before 2016. In the meantime, the agency is trying to extend the life of a problem-plagued 30-year-old facility that serves as the primary data center supporting the delivery of $700 billion in payments annually to more than 56 million Americans.
In early February, the General Services Administration (GSA) chose a site in Frederick County, Maryland to be the home of the new National Support Center (NSC) to replace the agency’s aging National Computer Center (NCC) in Woodlawn, Maryland. The site selection was originally scheduled to be completed in January 2010, but was delayed when government auditors expressed concern that the process had not given enough consideration to the cost of electric power.
Four Day Recovery Window
The Social Security Administration (SSA) recently completed a data center in North Carolina, dubbed the Second Support Center (SSC), to serve as a backup facility for the Woodlawn site. The agency had previously used a commercial data center as its backup. “In the event of an NCC failure, we can currently recover all critical workloads at the SSC within four days,” said Kelly Croft, Deputy Commissioner for Systems at the SSA, in Congressional testimony on Feb. 11. “Next year, we anticipate being able to reduce that recovery time to one day.”
Croft cited the “dire need” for a new data center. “Without a long-term replacement, the NCC will deteriorate to the point that a major failure to the building systems could jeopardize our ability to handle our increasing workloads without interruption,” Croft reported. “Despite all of our best efforts to preserve the NCC for as long as necessary, there is always the potential that a critical facility infrastructure system could suddenly fail.”
Croft’s testimony includes a litany of incidents and risks at the current 30-year-old NCC facility:
- No Dedicated Power: “Employee office spaces in other areas of the building share the same power lines and HVAC system as the data center. This design problem means that a potentially isolated issue in an area outside the data center, such as a minor receptacle overload at someone’s workstation, could temporarily shut down some power to the data center and HVAC system.”
- Aging Custom UPS System: “The UPS is not an off-the-shelf product; it was designed specifically for the building. While we have extended our service contract with the UPS maintenance vendor over the years, the vendor recently advised us that it could not guarantee repairs in the near future. The necessary parts are simply no longer available. If the UPS failed, we would have to bypass the system and deliver unconditioned power to the data center equipment, which could quite potentially damage the equipment. Replacing the UPS would require significant downtime at the NCC.
- Cabling Problems: “Tangled cables can block the under-floor airflow that cools our servers, and we cannot work on the cables safely without shutting down the affected systems. Similarly, troubleshooting problems is difficult when we cannot isolate cable pairs easily to determine whether problems exist in the cables or in the IT equipment. There is also an elevated risk of data corruption, because electro-magnetic interference from the electrical wires that are located too close to the telecommunication wires can distort data transmission.”
- Water in the Data Center: “Last year, our facilities staff noticed water on the floor of one of the large battery rooms in the NCC. They quickly traced the source to a leaking water pipe in the room. Any water in close proximity to high-voltage batteries presents a serious hazard to the building and its personnel. In order to fix the leak, plumbers needed to expose the pipe and cut off the water supply. Unfortunately, without redundant systems, cutting off the water supply to the pipe also required cutting off the water supply to the large air handling equipment that is responsible for cooling our computing space. Since the air handling equipment had to be turned off, we had to actually shut down a portion of our national computing operations while making the repairs.”
Despite these problems, the latest GSA timetable states the construction of the new Social Security data center will be completed in September 2014, with the agency requiring 18 months to install equipment and systems in the new facility. This places the current operational start date at August 2016. That timetable means that even as stimulus funds are supporting the completion of the new data center, the SSA will be investing in stop-gap measures to keep the NCC operational.
Band-aiding Existing Infrastructure
“Realizing that we will have to rely on the NCC for at least the next 5 years, we will do what we can to extend the life of the building,” said Croft. “We are working with GSA to complete a Building Engineering Report and a feasibility study to provide an updated assessment of the NCC facility systems and structure.”
“Relying on short-term fixes to serious problems at an old data center is just too much of a risk for our nation,” said Rep. Jeff Denham (R-CA), chairman of the Economic Development, Public Buildings and Emergency Management Subcommittee. “That is why it is particularly troubling that the timeline for completion of the new data center has already slipped by a year. “We cannot afford any further slip in the timeline and we cannot afford any added costs. The operations of this data center are too critical for the American people and this project is too costly to allow any more delays. GSA and SSA must work together to identify risks in the process and either avoid or mitigate against them.”
Tweets that mention Social Security Works to Avert Data Center Failure « Data Center Knowledge -- Topsy.comPosted February 22nd, 2011
[...] This post was mentioned on Twitter by datacenter, datacenter, Datacenter Mktplace, MyAllTop – Cloud, Katie Broderick and others. Katie Broderick said: All too common problem –> #SocialSecurity Works to Avert #DC Failure: 30-year old DC must last until 2016 http://bit.ly/g4GLwT @datacenter [...]
Michael DinsmorePosted February 22nd, 2011
The timelines and cost associated with this datacenter are preposterous. Private industry can accomplish larger projects faster, with less cost. Unlike the military security requirements, there are no particular criteria that makes this task harder to accomplish than a private cloud datacenter. It’s clear to me why the government isn’t able to accomplish anything of substantive value.
Many of the issues facing the Social Security National Support Center are typical of aging data centers around the globe. Constraints on power, space and cooling, as well as concerns regarding efficiency and availability, are top-of-mind for any data center manager. Although the NSC is experiencing some challenges, steps can be taken to enhance operations and improve availability and reliability of the physical infrastructure that support the Social Security Administration’s critical mission.
Even with the newly planned facility underway, solutions can be applied to support the existing legacy facility – which would be available to move to the new facility later on. Utilizing scalable solutions that can be easily retrofitted as demand changes is one option. Loads can be transferred to a COTS-based UPS platform that has a scalable capacity feature. Once installed, as load increases, software keys can be installed to meet the expanding demand. There are also cooling solutions that can be retrofitted and deployed in the existing facility – i.e. cold aisle containment, high density targeted cooling solutions, etc. – that can address issues in the current facility and be relocated to the new facility. This is a perfect example of a facility whose mission is so critical that it can’t compromise on availability and reliability in the name of efficiency.
Social Security Administration Cuts or Social Security Administration Cuts – the new GOP Budget Options | Entitled to KnowPosted February 24th, 2011
[...] worse, given the well documented need to replace SSA’s aging computer system, the Republicans’ proposed cuts threaten the whole program, if the current system and its [...]
[...] be forced into a backlog. Even worse, given the well documented need to replace SSA’s aging computer system, the Republicans’ proposed cuts threaten the whole program, if the current system and its [...]
mbPosted March 4th, 2011
SSA IT has always pushed to be state of the art trying to stay ahead of the technology curve instead of striving to be functional, efficient and reliable. I imagine this is the case here. SSA in technology since the turn of the century has been like a cat trying to catch its tail. They strain to untie knots only to tie them again in a tighter knot. They are their own worst enemy to achieving success..
. This applying of an archaic business structures and processes to new technology models does no more than make future change very difficult. SSA has never seemed to learn this fact.