Data storage, security and maintenance concerns voiced

By MARTY LEVINE

Concerned about the future of research data storage and governance at Pitt, the University Senate’s Computing and Information Technology Committee held a joint meeting with the Research Committee on Oct. 26, with the aim of urging the University to develop better guidance and resources for faculty.

The switch over the next year from Box to Microsoft’s OneDrive as the University’s main storage medium, necessitated by a four-fold increase in Box costs, is just one concern. The committees invited Rob Rutenbar, senior vice chancellor for research, to discuss the issue.

It is now very common, Rutenbar said, “that research projects, whether they are externally funded or not, have data products as part of the work — and one of the big questions around those things are, where do we put them, what are the requirements for maintenance and management?”

Data protections may be needed due to the confidentiality requirements of a researcher’s industrial partner, he noted, or because the research data contains the research subjects’ personal health information, or because some countries are restricted from seeing U.S. research data under federal law.

Researchers may have to be compliant with European Union data privacy rules, instituted several years ago, and also with the data accessibility rules of a grant funder, particularly large federal agencies such as the National Institutes of Health, he said.

In fact, the data needs to be accessible to many parties: “When you put them someplace, they need to be findable,” Rutenbar said of research data sets. “They need to be accessible. They should be inter-operable” on PCs, Macs and Unix devices. “They should be reusable, archived in ways that people can do things with them” without needing lengthy lessons to figure it out.

Some data must be used daily, he pointed out, others only intermittently, or at some unknown time in the future. And data sets have entered the petabyte range, more than 1,000 times larger than a terabyte.

“It would be lovely if there was a more institutional model” for how researchers could match their data storage and data use needs with Pitt’s capabilities, Rutenbar said. “We need to build a set of easy-to-use on-ramps for all these common cases. Right now, the way we manage this is kind of an oral tradition. … There ought to be a website where you can go” for this information.

The committees heard the concerns of two large Box users. Martin Oberbarnscheidt, faculty member in Surgery in the School of Medicine, for instance, said he does research involving animation models and had accumulated 150 terabytes of data over the past 10 years.

Approaching Pitt’s administration, he was offered a high-cost storage solution, so he decided to go his own way, purchasing his own servers and using Box to back up some of his data. “I did not feel at any time that Pitt was supporting our data storage policies and so we took care of it ourselves,” he said.

Jishnu Das, an Immunology faculty member, has needed both a large amount of data storage and lots of high-end computing. He has been using the Center for Research Computing but believes that “CRC clusters seem to be quite overloaded … and relatively slow.” He is looking for “nodes on the CRC cluster that are dedicated, rather than having these ad hoc arrangements.”

Adam Hobaugh, Pitt IT's deputy chief information officer, said his office has been working with the research office to develop guides to new storage media and offered to “white-glove everybody here” at Pitt who needs to consult about solutions to specific data use and storage issues. His office has already contacted the top 55 users of Box — who represent 60 percent of Box’s current use — to offer personal assistance.

“We raise this issue … for exactly this reason you’ve laid out,” said Michael Spring, computing committee chair. “There has been a commitment to various technologies over the past five years,” he said of Box. “I also think it’s important that commitments are made by faculty every day” to certain technologies.

“In this time of universal anxiety about everything that is going on … we need to take the anxiety” about research data “and eliminate that.” Over the next six months, he said the University should concentrate on “where are we going, here is the timeline, here is what we’re going to give you.”

As for the timeline on the switch to OneDrive, Rutenbar said it was “a little uncertain.” But he assured the committees that, while faculty come to his office “and are sure they are the only person on campus with that problem,” their data governance questions are quite a bit more common.

He intends to create a set of common use cases, he said, so that large data users can more easily find the proper avenue for storage at Pitt, perhaps in partnership with the CRC and the Pittsburgh Supercomputing Center, in which the University is a partner.

Marty Levine is a staff writer for the University Times. Reach him at martyl@pitt.edu or 412-758-4859.

 

Have a story idea or news to share? Share it with the University Times.

Follow the University Times on Twitter and Facebook.