The San Diego Supercomputer Center (SDSC) at the University of California, San Diego, launched a large academic-based cloud storage system, designed for researchers, students, academics, and industry users who require stable, secure, and cost-effective storage and sharing of digital information, including extremely large data sets.
Scale and Speed
The new 5.5 petabyte system is 100 percent disk-based and interconnected by high-speed 10 gigabit Ethernet switching technology, providing extremely fast read and write performance. Scalable to hundreds of petabytes the SDSC cloud has sustained read rates of 8 to 10 gigabytes per second, and will improve as additional nodes and storage are added. The cloud can be accessed from Rackspace or Amazon S3 API’s, or the SDSC Cloud Explorer web interface.
The SDSC Cloud leverages the infrastructure designed for a high-performance parallel file system by using two Arista Networks 7508 switches, providing 768 total 10 gigabit (Gb) Ethernet ports for more than 10Tbit/s of non-blocking, IP-based connectivity. Costs on the SDSC Cloud site is listed for as low as $3.25/Month for 100GB ($32.50/Terabyte/Month) with no transfer costs.
Massive Data – Long Term Storage
“We believe that the SDSC Cloud may well revolutionize how data is preserved and shared among researchers, especially massive datasets that are becoming more prevalent in this new era of data-intensive research and computing,” said Michael Norman, director of SDSC. “The SDSC Cloud goes a long way toward meeting federal data sharing requirements, since every data object has a unique URL and could be accessed over the Web.”
The project began at UC San Diego’s campus Research Cyberinfrastructure (RCI) project and grew quickly in scope and partners as they saw it as functionally revolutionary and cost effective for their needs. “The SDSC Cloud marks a paradigm shift in how we think about long-term storage,” said Richard Moore, SDSC’s deputy director. “We are shifting from the ‘write once and read never’ model of archival data, to one that says ‘if you think your data is important, then it should be readily accessible and shared with the broader community.’”
The SDSC serves as the site lead forTeraGrid, the grid infrastructure for open scientific research, and also hosts the Triton Resource compute system. In this 2010 video, SDSC’s Data Center Manager Matt Campbell provides a short tour of the facility.