Social Security Works to Avert Data Center Failure

A $500 million project to build a new data center for the Social Security Administration is year behind schedule, and won't be ready until 2016. Meanwhile, the agency is trying to extend the life of a problem-plagued 30-year-old facility that is critical to the SSA.

A look at the equipment area of the primary data center for the Social Security Administration, revealing some serious cable management challenges beneath the raised floor.

Nearly two years after $500 million in stimulus funding was earmarked to build a new data center for the Social Security Administration, the project  is already a year behind schedule and won't be operational before 2016. In the meantime, the agency is trying to extend the life of a problem-plagued 30-year-old facility that serves as the primary data center supporting the delivery of $700 billion in payments annually to more than 56 million Americans.

In early February, the General Services Administration (GSA) chose a site in Frederick County, Maryland to be the home of the new National Support Center (NSC) to replace the agency's aging National Computer Center (NCC) in Woodlawn, Maryland.  The site selection was originally scheduled to be completed in January 2010, but was delayed when government auditors expressed concern that the process had not given enough consideration to the cost of electric power.

Four Day Recovery Window

The Social Security Administration (SSA) recently completed a data center in North Carolina, dubbed the Second Support Center (SSC), to serve as a backup facility for the Woodlawn site.  The agency had previously used a commercial data center as its backup. "In the event of an NCC failure, we can currently recover all critical workloads at the SSC within four days," said Kelly Croft, Deputy Commissioner for Systems at the SSA, in Congressional testimony on Feb. 11. "Next year, we anticipate being able to reduce that recovery time to one day."

Croft cited the "dire need" for a new data center.  "Without a long-term replacement, the NCC will deteriorate to the point that a major failure to the building systems could jeopardize our ability to handle our increasing workloads without interruption," Croft reported. "Despite all of our best efforts to preserve the NCC for as long as necessary, there is always the potential that a critical facility infrastructure system could suddenly fail."

Croft's testimony includes a litany of incidents and risks at the current 30-year-old NCC facility:

  • No Dedicated Power: "Employee office spaces in other areas of the building share the same power lines and HVAC system as the data center. This design problem means that a potentially isolated issue in an area outside the data center, such as a minor receptacle overload at someone’s workstation, could temporarily shut down some power to the data center and HVAC system."
  • Aging Custom UPS System: "The UPS is not an off-the-shelf product; it was designed specifically for the building. While we have extended our service contract with the UPS maintenance vendor over the years, the vendor recently advised us that it could not guarantee repairs in the near future. The necessary parts are simply no longer available. If the UPS failed, we would have to bypass the system and deliver unconditioned power to the data center equipment, which could quite potentially damage the equipment. Replacing the UPS would require significant downtime at the NCC.
  • Cabling Problems: "Tangled cables can block the under-floor airflow that cools our servers, and we cannot work on the cables safely without shutting down the affected systems. Similarly, troubleshooting problems is difficult when we cannot isolate cable pairs easily to determine whether problems exist in the cables or in the IT equipment. There is also an elevated risk of data corruption, because electro-magnetic interference from the electrical wires that are located too close to the telecommunication wires can distort data transmission."
  • Water in the Data Center:  "Last year, our facilities staff noticed water on the floor of one of the large battery rooms in the NCC. They quickly traced the source to a leaking water pipe in the room. Any water in close proximity to high-voltage batteries presents a serious hazard to the building and its personnel. In order to fix the leak, plumbers needed to expose the pipe and cut off the water supply. Unfortunately, without redundant systems, cutting off the water supply to the pipe also required cutting off the water supply to the large air handling equipment that is responsible for cooling our computing space. Since the air handling equipment had to be turned off, we had to actually shut down a portion of our national computing operations while making the repairs."

Current Timetable

Despite these problems, the latest GSA timetable states the construction of the new Social Security data center will be completed in September 2014, with the agency requiring 18 months to install equipment and systems in the new facility. This places the current operational start date at August 2016. That timetable means that even as stimulus funds are supporting the completion of the new data center, the SSA will be investing in stop-gap measures to keep the NCC operational.

Band-aiding Existing Infrastructure

"Realizing that we will have to rely on the NCC for at least the next 5 years, we will do what we can to extend the life of the building," said Croft. "We are working with GSA to complete a Building Engineering Report and a feasibility study to provide an updated assessment of the NCC facility systems and structure."

"Relying on short-term fixes to serious problems at an old data center is just too much of a risk for our nation," said Rep. Jeff Denham (R-CA), chairman of the Economic Development, Public Buildings and Emergency Management Subcommittee. "That is why it is particularly troubling that the timeline for completion of the new data center has already slipped by a year. "We cannot afford any further slip in the timeline and we cannot afford any added costs. The operations of this data center are too critical for the American people and this project is too costly to allow any more delays. GSA and SSA must work together to identify risks in the process and either avoid or mitigate against them."

Hide comments

Comments

  • Allowed HTML tags: <em> <strong> <blockquote> <br> <p>

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
Publish