Bionimbus Applies Cloud Power to Genetic Data-Crunching
May 16th, 2013 By: John Rath
An ambitious project at the University of Chicago aims to lead the nation in biomedical computation, by making the region the largest hub in the world for genetic and medical information.
At the forefront of the effort is Bionimbus, an open source cloud-based system for managing, analyzing and sharing genomic data. Developed by the Institute for Genomics and Systems Biology (IGSB) at the University of Chicago, the Bionimbus community cloud is operated by the Open Cloud Consortium‘s Open Science Data Cloud, and an open source version of Bionimbus available to those who wish to set up their own clouds.
Bionimbus is designed to support next-generation gene sequencing instruments and integrates technology for analyzing and transporting large datasets. The Open Cloud Consortium (OCC) currently distributes around one petabyte of scientific data to interested users and plans to roughly double that amount of data in each of the next several years. Most OCC users are at universities and institutes that are on high-speed networks Internet2 or National Lambda Rail.
Pritzkers Assist With Fundraising
Recently Hyatt Hotels Chairman Tom Pritzker and his wife Margo hosted a fundraiser to introduce the project to about 50 influential friends. Pritzker is a university trustee and has hosted many annual dinners for the University of Chicago Medicine.
“Frankly, I’ve walked away from any one of the dinners really excited about whatever the topic was because it’s like a window into the future,” Pritzker told the Chicago Tribune. “You get to sit here, and for two hours someone is painting a picture for you of what the world is going to be like 10 to 15 years from now.”
During the fundraiser University of Chicago computer scientist Ian Foster presented a map of global fiber-optic networks, highlighting the densely populated Chicago area. With Chicago being the crossroads of information, the big data project hopes to leverage that geographic advantage for building the genome storage hub.
“Business, innovation, discovery, jobs still depend on taking raw materials and turning them into refined products,” Foster said. “Often, nowadays, the raw material is data and the refined material is knowledge.”
Leveraging Beagle Supercomputer
University of Chicago Computation Institute (C.I.) senior fellow and IGSB associate senior fellow Robert Grossman has been working on the Bionimbus Cloud for approximately four years. He states that it is currently one of the largest clouds to hold genomic data. It is the first project of its kind authorized by the National Health Institute (NIH) to use public data about genomes to perform biomedical research.
Argonne National Laboratory and IGSB are collaborating on two big data projects, using the Beagle supercomputer and the Bionimbus Cloud. The Beagle supercomputer was launched last month by the University of Chicago Biological Sciences Division and the Computation Institute. The 150 Teraflop system contains 186 blades, housed in 8 Cray XE6 cabinets.
With a goal to revolutionize the way clinical researchers analyze and collect medical data, the big data projects will simulate biological processes in order to understand the causes of certain diseases like cancer, and to compile knowledge about basic patient outcomes and recent medical discoveries in order to discern more effective diagnoses and treatments.