Seagate Gift Supports Genomic Data Storage
By Tim Stephens
UC Santa Cruz
August 20, 2015 — Santa Cruz, CA
(Photo above: Ethan Miller, professor of computer science, directs the Center for Research in Storage Systems, aka CRSS. Credit: Elena Zhukova)
This gift, valued at $250K includes 2.5 petabytes of storage for studying large-scale data storage challenges in genomics and other areas at UCSC
Researchers in the Baskin School of Engineering at UC Santa Cruz are working with industry partner Seagate Technologies on new ways to structure and store massive amounts of genomic data. Seagate has donated data storage devices with a total capacity of 2.5 petabytes to support this effort.
“This gift provides the basis for a major research program on storage of genomic data,” said Andy Hospodor, executive director of the Storage Systems Research Center (SSRC) at UC Santa Cruz.
“Seagate is pleased to be a part of this important research effort. The storage requirements for genomics are staggering and the potential for medical breakthroughs even larger,” said Mark Re, senior vice president and CTO at Seagate.
The gift, valued at $250,000, includes 1 petabyte of Seagate’s new Kinetic disk drives for object-based storage, plus an additional 1.5 petabytes of traditional Seagate SATA disk drives for use in existing clusters within the UC Santa Cruz Genomics Institute.
“This gives us a large-scale test bed that we can use to explore the organization of data for large-scale disk-based storage systems. We need to develop better ways to store and organize the vast quantities of data we’re generating,” said Ethan Miller, professor of computer science and director of the Center for Research in Storage Systems (CRSS) at UCSC.
Miller and other storage systems researchers at UC Santa Cruz work closely with industry partners such as Seagate, and several of the center’s alumni and graduate students have been working at Seagate on the company’s latest disk technology. The Seagate storage donation will support research on new ways to structure and store genomic data using object stores and newly proposed open-source standards (APIs) for genomic data that are being developed by the Global Alliance for Genomics and Health.
“Genomic data storage is one of several areas of emerging interest where we’ll be looking at using Seagate’s new intelligent disks to build large-scale storage systems,” Miller said.
The donation also adds over a petabyte of storage capacity to the genomics data storage cluster maintained by the UC Santa Cruz Genomics Institute at the San Diego Supercomputing Center. For Benedict Paten, a research scientist at the Genomics Institute, it’s all about speeding up the processing of genomic data.
“We in genomics know that we have a big data problem,” Paten said. “We need to be able to compute on much larger volumes of data than we have before. The amount of genomic data is growing exponentially, and we haven’t been keeping up.”
Part of the solution, he said, is distributed processing of large data sets in which the processing is done where the data are stored, instead of downloading the data over a network for processing. “Now we can put a lot of disks on the compute nodes for efficient distributed computation over large amounts of data. This donation is really important for our big data genomics efforts at UC Santa Cruz,” Paten said.