Reed Data Scientists Want To Safeguard Your Privacy

Professors win NSF grant to find ways to protect personal information stored in giant databases.

By Ian Buckman ’18 | October 18, 2018

An interdisciplinary team of Reed researchers has won a $345,000 grant from the National Science Foundation to develop new ways to protect personal data that has become increasingly vulnerable to inadvertent exposure.

In a world where data is king, myriad organizations from government agencies to multinational corporations to hospital systems maintain terabytes of information in their databases. This profusion of data is a goldmine for researchers in all kinds of fields, with the potential to answer all kinds of questions, but there’s a snag. People can often trace the information back to you, even when the owner of the data deletes obvious markers, such as your name, address, and social security number. In 2014, for example, the New York City Taxi and Limousine Commission released a giant database of taxi rides in response to a freedom-of-information request. The commission attempted to anonymize the data, but enterprising journalists were able to piece together various clues to identify rides taken by celebrities.

And that’s just the beginning. Most American can be identified by just three pieces of information: birthdate, sex, and ZIP code. With the risk of inadvertent breach of privacy so high, many organizations have locked their invaluable data away from researchers.

Prof. Adam Groce [computer science], Prof. Andrew Bray [statistics], and Prof. Anna Ritz [computational biology] are working on a creative solution to this problem that revolves around the concept of differential privacy. In differential privacy, researchers do not have direct access to the database, but make queries through a system that adds digital “noise” to the output, allowing researchers to obtain the values they need, but not the underlying identities. Companies such as Google, Apple, and Uber are beginning to use this approach, but researchers in the social science and medical fields have thus far been slow to pick up on it for fear that the noise will skew their results.

The Reed professors and their students plan to develop algorithms that will allow researchers to test specific hypotheses (does living in the ZIP code 97202 correlate with higher chance of heart attack?) using standard statistical tools while maintaining differential privacy.

As Prof. Groce puts it: “I see the inclusion of undergraduates as one of the main reasons research is part of my job. It’s also great to get students from different fields (CS and stat) working together and sharing their expertise in such a collaborative way.”

Tackling this project requires an interdisciplinary approach. “Computer science is very good at thinking about what constitutes privacy and how to protect it,” Prof. Groce says, “while statistics studies how to usefully analyze data and has great tools for understanding how useful a particular analysis can be.”

Tags: Cool Projects, Academics, Institutional, Awards & Achievements