Seagate, UC Santa Cruz collaboration poised to accelerate genomics data analysis
August 27, 2019 — Santa Cruz, CA
Initial focus of the collaboration by Genomics Institute, Baskin School of Engineering at UCSC and Seagate will be accelerating the analysis of the Human Cell Atlas (HCA).
Seagate Technology, a world leader in data storage solutions, and the Genomics Institute and Baskin School of Engineering at UC Santa Cruz announced today that they have entered into a multi-year, joint research and development agreement to accelerate genomics data analysis using computational storage technology.
The initial focus of this collaboration will be on accelerating the analysis of the Human Cell Atlas (HCA), a scientist-led initiative that has emerged as a collaborative federation of diverse experts to map every type of cell in the healthy human body as a resource for studies of health and disease.
“This partnership with Seagate is an excellent example of how university-industry collaborations can accelerate meaningful research for the benefit of society,” said Alexander Wolf, dean of the Baskin School of Engineering. “Baskin Engineering’s expertise in genomics and computational biology, as well as in storage, data, and distributed systems — combined with Seagate’s computational storage technology — could lead to consequential results and far-reaching impact that might not otherwise be possible.”
The UC Santa Cruz Genomics Institute has been working with specialists in biology, computation, and medicine — including those at the European Bioinformatics Institute and the Broad Institute of Harvard and MIT — to formulate, fund, and jointly build the Data Coordination Platform for the Human Cell Atlas. Recently, the Chan Zuckerberg Initiative (CZI) funded HCA-selected “seed networks.” These projects involve 20 countries and more than 200 labs and will begin sequencing specific organs, such as the heart, eye, or liver, in the healthy human body. The resulting cellular and molecular maps will be a resource for understanding what goes wrong when disease strikes.
As these maps grow in size, traditional architectures are being strained. By leveraging Seagate’s Active DriveTM computational storage technology, the UC Santa Cruz research team hopes to increase access and accelerate analysis of these molecular maps to reduce the time from data generation to insight and discovery.
“Our goal is to speed up the analysis from batch time scales of hours to interactive time scales of seconds or faster. If the time from question to answer for an investigator drops from hours or minutes to seconds, the entire experience and approach changes ultimately accelerating discovery” said Peter Alvaro, Assistant Professor, Computer Science and Engineering, UC Santa Cruz Baskin School of Engineering.
Existing sequencing techniques combine millions of cells to generate a single ‘bulk’ measurement. Recent transformative advances are enabling massively parallel single-cell sequencing of millions of individual cells and thereby increasing the size of the resulting data several orders of magnitude. This technique is rapidly translating from research into the clinic, where reducing the analysis and exploration time could ultimately lead to the acceleration of precision medicine at scale.
“Today, the primary users of this architecture will be researchers around the world, as all the data in the Human Cell Atlas will be public. But once the atlas is complete, these techniques will be translated into the clinic where interactive time scales are a requirement” said Josh Stuart, Professor, Biomolecular Engineering and Associate Director, UC Santa Cruz Genomics Institute. “Single-cell sequencing is poised to revolutionize cancer treatment, as it helps convey a detailed picture of the tumor microenvironment, which facilitates selection of combination, targeted, and precision therapies,” Stuart explained.
Seagate has a track record developing storage systems with integrated compute capability. As part of this project, Seagate is identifying vertical and associated applications where computation can be moved closer to storage in order to leverage proximity as a way to minimize total computational time. “The intention is to create efficiencies by bringing computation power closer to the location where data resides. As the quantity of data and computing power grows exponentially, capturing these efficiencies becomes critical to operate at the speed business and science require,” said Edward Gage, Vice President, Seagate Research Group.
Steadily declining sequencing costs as well as the rapid development of new techniques is leading to an explosion in genomic data ready for analysis. Historically, the last stages of genomic analysis required computation on relatively small amounts of data compared to the source genomic sequences. As the cost of sequencing has dropped to unexpectedly affordable levels, the architectures to handle the data analysis have not kept up.
“One of the highest leverage areas where new computational architectures can have an impact is at the point of analysis where insights occur and science leaps forward,” said Paul Kusbel – Senior Director of Engineering, Seagate Research Group. “We look forward to moving this application of computational storage forward with the hope that clinicians and their patients will ultimately benefit,” Kusbel elaborated.