NSF extends funding for the Center for Research in Storage Systems five more years
By James McGirk
Baskin School of Engineering
April 9, 2019 — Santa Cruz, CA
On March 6 2019, funding for the Center for Research in Storage Systems (CRSS) at the Baskin School of Engineering at UC Santa Cruz was extended by the National Science Foundation (NSF): CRSS is now a Phase II Industry-University Cooperative Research Center (IUCRC).
The Center’s mission is to tackle the goliath problem of data storage and retrieval in an age when the sheer quantity of data produced by social media, health care and simulation (for example) far exceeds the ability of computers to keep up with it. How do you swiftly, securely and accurately retrieve data from databases containing a million billion pieces of it?
The renewal guarantees another five years of funding from the NSF, and signals a substantial record of scientific achievement for the center, and a stream of student placements into the center’s partners in industry and academia. This includes five female PhDs, an unusual achievement for groups in computer systems. Approximately half of recent PhDs were female.
“The general idea behind these centers are that they provide a place where companies can come talk to us and each other about pre-competitive things,” CRSS Director Ethan Miller, professor of computer science and engineering, said. “We care about issues that affect industry as whole: We aren’t going to build their products; we’re not a test lab. What we do is work with them and figure out what their pain points are, and what they would like to be able to do but don’t know how.”
The NSF funding amounts to $100K per year to the center. Each corporate sponsor contributes an additional $50K. In exchange for allowing their co-sponsors to license it for free, the university gets to keep any intellectual property developed.
During the first phase of the Industry-University Cooperative Research Center, CRSS worked with between eight and twelve co-sponsors in industry at any given time. Phase II guarantees another five years of funding from the National Science Foundation, with increased expectations for success; expectations that Miller says have already been surpassed.
“We’ve graduated a lot of students who’ve done great research in a lot of areas,” Miller said, “Rekha Pitchumani, for example, was a student who graduated a couple of years ago. She worked with Seagate on providing key value storage for a new disk technology called shingled magnetic recording (SMR). Seagate shipped a product using this technology less than a year after our research was published. Dr. Pitchumani’s research won a Best Paper Award, and her doctoral thesis on the topic won the Best Dissertation award from the Computer Science Department.”
Pitchumani eventually went to work with another CRSS sponsor, Samsung.
“That was just one. Another example of what we’ve done at CRSS is a project I did with a former student of mine for Data Domain (which is now EMC) and the University of Tennessee,” Miller said. “We created a library of erasure codes that will make our reliability mechanism much more efficient.”
Also garnering attention was the work they did with optimization, security, and their work in long-term storage and very large scale storage.
“I’m told the NSF was pleased with how interdisciplinary the Center was,” Miller said. “Many Centers work on a single project per company, but we often had projects in which five or six companies were interested.”
Miller is in the process of approaching other universities to join the center for Phase II (contenders include the University of California San Diego, which has expertise in non-volatile memory and the University of British Columbia). Many of the current sponsors are storage providers like Seagate or Samsung. The next step, according to Miller, would be to bring in major storage users such as Google, Facebook, or Oracle.
“Informally the biggest users of data are said to lose almost one percent of their data every year,” Miller said. “Disks are pretty unreliable, they fail at a rate of about 1-2% per year which means, if you’ve got a million of them, that you have to replace a ten thousand disks a year. Then there’s the process of actually finding and retrieving the data… Not for nothing is Facebook’s storage system called ‘Haystack.’”
Storage is more than just archiving, after all. There’s security and retrieval.
“I always like to end my talks with that scene from Indiana Jones and the Raiders of the Lost Ark,” Miller said, referring to the famous closing scene where the ark of the covenant is crated up and buried in an enormous government warehouse, presumably never to be seen again. “How do you secure something so that no one unauthorized can ever find nor use it against you; how can we balance these these issues and still find the ark in the archive?”
It is a problem that won’t go away anytime soon. Storage–and especially retrieval–will become ever more important as society becomes enmeshed in networks of sensors.
“People often make an analogy with providing electricity,” Miller said. “But I find that inaccurate. We don’t care where our electricity comes from but family photos are something else:you have to get back exactly what you put in. Otherwise it’s useless.”
As more of the world’s information is digitized, the more important it becomes. “To borrow a term from computer science, storage is our digital society’s base state,” Miller said. Without a reliable bedrock of storage, our information-saturated society would crumble into oblivion.
This article was originally published here: https://www.soe.ucsc.edu/news/nsf-extends-funding-center-research-storage-systems-five-more-years