Researchers from Stanford and Princeton universities have raised more questions about privacy on the internet. In a study recently released, they show that a person’s online behavior can be recognized by linking social media profiles to anonymous web browsing histories.
The paper is due to be presented at the World Wide Web Conference Perth, Australia, in April 2017. The researchers note that social media profile accounts such as Facebook, Twitter or Reddit can be linked to browsing histories.
One of the authors of the research article, Arvind Narayanan, an assistant professor of computer science at Princeton, noted that companies such as Facebook and Google already track users online and know their identities. These companies do however disclose their tracking. Narayanan added that their research shows that anyone with access to browsing histories can identify many users by analyzing public information from social media accounts. Many companies and organizations already have this ability.
Narayanan is also an affiliated faculty member at Princeton’s Center for Information Technology Policy. He cautions that although users may assume they are anonymous when they are browsing a health or a news website, their research has identified another method by which tracking companies could possibly learn their identities.
Narayanan noted that the Federal Communications Commission adopted new privacy rules for internet service providers recently. These rules allow ISPs to only store and use consumer information when it is “not reasonably linkable” to individual users. He suggest that their results show that pseudonymous browsing histories fail this test.
Many online advertising companies compile browsing histories of users with tracking programs that are embedded on webpages.
Although some of these companies link identities to these profiles, most promise that the web browsing information is not linked to an identity. The researchers wanted to find out if de-anonymizing web browsing and identifying a user could be achieved, even though the web browsing history did not contain identities.
To investigate this, they limited themselves to publicly available information. The strongest possibility was offered by social media profiles, especially those that include links to external webpages. The team created an algorithm to compare links appearing in people’s public social media accounts to anonymous web browsing histories.
An author of the study, Sharad Goel, an assistant professor at Stanford, noted that every person’s browsing history is unique and contains revealing signs of their identity.
The algorithms were able to identify patterns among the different groups of data and used these patterns to identify users. The researchers admit that the method is not yet perfect, as it requires a social media feed that includes a number of links to outside sites. They are however able to deduce the corresponding Twitter profile more than 50 percent of the time when using a history with 30 links originating from Twitter.
Even greater success was achieved in an experiment involving 374 volunteers who provided their web browsing information. Although the number of original users in the study was higher, some participants were disregarded because of technical problems in processing their information. The researchers managed to identify more than 70 percent of the remaining users by comparing hundreds of millions of public social media feeds to their web browsing data.
According to Yves-Alexandre de Montjoye, an assistant professor at Imperial College London, the research demonstrated that it is easy for anyone who knows how to code to build a full-scale de-anonymizationer that only uses what’s already available.
De Montjoye was not involved in the project, but commented that all the evidence that has been accumulated over the years, including this study, that show the strong limits of data anonymization, emphasizes the need to rethink the general approach to data protection and privacy in the age of big data.