Unintentional Disclosures of Personal Information: Leakage and Inference

Concerns about online privacy are well-founded. For example, in a 2011 study, Krishnamurthy, Naryshkin, and Wills found that 56% of the popular websites they examined leak private information to third parties (76% if site ids are counted as private information). And as increasing amounts of information are digitized and statistical algorithms for sorting and interpreting that information are improved, more can be easily learned about individuals by linking disparate pieces of data together.

Data aggregation and data mining are respectively the collection of large amounts of data from various sources and the computational analysis of data to gain new knowledge. In many ways, data aggregation and data mining serve the public good, for example, by allowing medical researchers to observe patterns and correlations that lead to better healthcare outcomes and city planners to optimize traffic flow. Liao, Chu, and Hsiao (2012) extensively detail many of the applications of data mining.

Unfortunately, both collecting and analyzing data creates a risk of confidential information being linked to personally identifying information. Sweeney (2000) showed that de-identifying data by stripping away explicit identifiers, such as name, address, and telephone numbers, is generally insufficient to protect against de-anonymization. While characteristics such as race, age, and zip code are not usually unique, the combinations of those characteristics can often be used to uniquely identified individuals. By linking data from publicly available health records with media reports, Sweeney (2013) connected the names of 43% of supposedly anonymous patients with their health records.

Moreover, new methods for information indexing and retrieval often lead to new ways for privacy to be compromised. Global inference attacks are one example of how data mining is misused. Global inference attacks occur when attackers put together information gathered from a variety of sources to gain a comprehensive understanding of an individual’s identity and behavior. Friedland, Maier, Sommer, and Weaver (2011) explored scenarios for global inference attacks, review the differing resources at an attacker’s disposal, and offer reasons for why the attacks might occur. Friedland and Sommer (2010) examined a particularly malicious use of inference: cybercasing, or using online tools to scope out a real-world location for questionable purposes. Using geo-tagged information publicly available online, the researchers were able to determine where users were located and when they were likely to be absent from their residences.

Studying the extent to which audio tracks could be used to match a video to their uploader, Lei, Choi, Janin, and Friedland (2011) found that simple speaker recognition software reliably matched 66.3% of uploaders with randomly selected videos. Jernigan and Mistree (2009) studied how public information about individuals’ social networks — who they associated with, and how — revealed private information. With a sample case of Facebook data, they showed a correlation between the percentage of a user’s friends who self-identified as gay males and the user’s own sexual orientation.

This capacity for revealing demographic characteristics can have consequences. In one study, Datta, Tschantz, and Datta (2014) explored how a user’s demographics affected the ads they received. Those who set their Google profile to female received fewer ads for high paying, high profile jobs than those who set it to male. The researchers suggest that inferring demographics could lead to discriminatory practices in other venues.

Recommended Reading

S. Liao, P. Chu and P. Hsiao, "Data mining techniques and applications – A decade review from 2000 to 2011," Expert Systems with Applications, vol. 39, no. 12, 2012. [Online.] Available: https://www.elsevier.com/__data/assets/pdf_file/0017/97001/Data-mining-techniques-and-applications.pdf. [Accessed: 3 June 2015].

L. Sweeney, "Simple Demographics Often Identify People Uniquely," Carnegie Mellon University, Data Privacy Working Paper 3, 2000. [Online]. Available: https://dataprivacylab.org/projects/identifiability/paper1.pdf. [Accessed: 3 June 2015].

G. Friedland, G. Maier, R. Sommer, and N. Weaver, "Sherlock Holmes’s evil twin: On the impact of global inference for online privacy," in Proceedings of the New Security Paradigms Workshop, September 2011, Marin County, California. Available: International Computer Science Institute, http://www.icsi.berkeley.edu/icsi/publication_details?n=3168. [Accessed: 3 June 2015].

Additional References

L. Sweeney, "Matching Known Patients to Health Records in Washington State Data," Harvard University, White Paper, pp. 1089-1, 2013. [Online]. Available: https://dataprivacylab.org/projects/wa/1089-1.pdf. [Accessed: 3 June 2015].

G. Friedland and R. Sommer, "Cybercasing the joint: On the privacy implications of geotagging," in Proceedings of the Fifth USENIX Workshop on Hot Topics in Security, August 2010, Washington, D.C. Available: International Computer Science Institute, http://www.icsi.berkeley.edu/icsi/publication_details?n=2932. [Accessed: 3 June 2015].

B. Krishnamurthy, K. Naryshkin, and C. Wills, "Privacy leakage vs. protection measures: The growing disconnect," presented at the Workshop on Web 2.0 Security and Privacy 2011 (W2SP 2011), May 2011, Oakland, California. Available: http://w2spconf.com/2011/papers/privacyVsProtection.pdf. [Accessed: 13 June 2015].

H. Lei, J. Choi, A. Janin, and G. Friedland, "User verification: Matching the uploaders of videos across accounts," in Proceedings of the IEEE international Conference on Acoustic, Speech, and Signal Processing, May 2011, Prague, Czech Republic. Available: International Computer Science Institute, http://www.icsi.berkeley.edu/icsi/publication_details?n=3089. [Accessed: 3 June 2015].

C. Jernigan and B. Mistree. "Gaydar: Facebook friendships expose sexual orientation," First Monday, vol. 14, no. 10, 5 October 2009. [Online]. Available: http://firstmonday.org/article/view/2611/2302. [Accessed: 3 June 2015].

A. Datta, M. C. Tschantz, and A. Datta, "Automated experiments on ad privacy settings: A tale of opacity, choice, and discrimination," in Proceedings of Privacy Enhancing Technologies Symposium, July 2015. [Online]. Available: https://arxiv.org/abs/1408.6491. [Accessed: 4 June 2015].

Limitations of and Misconceptions About Privacy Tools

While many privacy protection tools exist, each is limited in its utility, and users often misunderstand the extent to which they will protect privacy. For example, while not allowing cookies may make tracking more difficult, it will not prevent browser fingerprinting: the unique identification of a browser based on its configuration (such as version number, installed fonts, etc.), which is shared with websites so that information can be correctly displayed to the end user. Eckersley (2010) found that with Flash or Java installed, 94.2% of the browsers in their sampling had unique fingerprints. Even when versions changed, Eckersley was able to re-identify 99.1% of the browsers. He concluded that browser fingerprinting needed to be considered alongside other tracking methodologies when managing user privacy.

Private browsing or "Incognito" modes suggest to many users that their information will remain private. In reality, private browsing modes merely attempt to erase a user’s activities on the local computer each time a browser is closed. Aggarwal, Bursztein, Jackson, and Boneh (2010) analyzed private browsing modes in Internet Explorer, Firefox, Chrome, and Safari and determined that not only were the protections offered by each browser and version considerably varied, they were also negligibly effective, especially when browser extensions were installed. Users may undermine the efficacy of the tools if they don't actually understand how they work. In fact, one of the most important parts of using privacy-enhancing technologies effectively is understanding that they do not always provide the protection that they advertise. While ad-blockers are marketed as preventing third-party ad companies from tracking users' online activities, Sar and Al-Saggaf (2013) reported that popular ad blocking tools failed to reliably prevent leaks of both browsing habits and personally identifying information to third parties.

Even when tools are correctly managed, perfect protection against tracking and security breaches is impossible because the Internet is inherently an insecure and unprivate system. Cavoukian and Kruger (2014) describe seven fundamental security problems: Foolish objects that do not discriminate about who they share information with; sporadic control of information; lack of digital object evaluation by computers accepting information; difficulty of authentication; bad actors with unaccountable pseudonyms; the ability of a data administrator to access that data; and the complexity of managing notification and consent. The multiplicity of devices and numerous access points that make up the Internet necessarily create these problems. The increasing tendency of objects to be equipped with electronics that connect them to other devices or to the Internet — a phenomenon known as the Internet of Things — further contributes to security vulnerabilities. Studying the Internet of Things from a security perspective, Heer, Garcia-Morchon, Hummen, Keoh, Kumar, and Wehrle (2011) detail the technical limitations of existing Internet security protocols in maintaining user security.

Recommended Reading

P. Eckersley, "How unique is your web browser?," In Proceedings of the 10th International Conference on Privacy Enhancing Technologies, 2010, Berlin, Germany. Available: Electronic Frontier Foundation, https://panopticlick.eff.org/static/browser-uniqueness.pdf. [Accessed: 3 June 2015].

G. Aggarwal, E. Bursztein, C. Jackson, and D. Boneh, "An analysis of private browsing modes in modern browsers," in Proceedings of the 19th USENIX Conference on Security, 11-13 August 2010, Washington, D.C. Available: Stanford University, http://crypto.stanford.edu/~dabo/pubs/papers/privatebrowsing.pdf. [Accessed: 3 June 2015].

Additional References

R. K. Sar and Y. Al-Saggaf, "Propagation of unintentionally shared information and online tracking," First Monday, vol. 18, no. 6, June 2013. [Online]. Available: https://www.researchgate.net/publication/250310685_Propagation_of_unintentionally_shared_information_and_online_tracking. [Accessed: 13 June 2015].

A. Cavoukian and D. Kruger, Freedom and Control: Engineering a New Paradigm for the Digital World, Privacy by Design, Report, May 2014.

T. Heer, O. Garcia-Morchon, R. Hummen, S. L. Keoh, S. S. Kumar, and K. Wehrle, "Propagation of unintentionally shared information and online tracking," Wireless and Personal Communications: An International Journal, vol. 61, no. 3, December 2011. [Online]. Available: https://firstmonday.org/ojs/index.php/fm/article/download/4349/3681  [Accessed: 18 January 2023].

Limitations of and Misconceptions About Privacy Regulation

Currently in the United States, the burden of privacy protection falls on the user. Consumer online privacy regulation is limited, varies from locale to locale (raising questions about whose jurisdiction applies), and is usually enforced only when there is a complaint. Moreover, the government does not enforce regulation of its own collection and use of private online data with notably more regularity. There is an ongoing debate about how historical precedents for a right to privacy (e.g. Warren and Brandeis 1890) apply online. Solove (2013) argued that the current policy of privacy self-management is impractical, noting that it does not generally lead to meaningful consent. Nevertheless, he considers putting privacy decision-making in the hands of lawmakers to be problematically controlling and recommended, as an alternative, the development and codification of privacy norms that would form the outer boundaries of the law.

Without effective regulation limiting information collection and sharing of consumer data, most businesses and organizations elect to use an "opt-out" model, meaning that they will collect and share user information until a user explicitly opts out. In a token nod to user privacy, terms of service agreements detailing how user information will be used have become commonplace. McDonald & Cranor (2008) investigated how much time Americans would need to thoroughly read these privacy policies and quantified it. In their estimate, Americans would spend about $781 billion worth of time each year merely reading privacy policies — as compared to the annual value of online advertising, at that time about $21 billion. McDonald and Cranor conclude that the industry of online advertising "is worth substantially less" than the time users are expected to commit to educating themselves.

Not only is the burden of privacy placed on users, users assume that they are more thoroughly protected by laws and regulations than they actually are. Hoofnagle and King (2008) surveyed Californians about the default regulations protecting the privacy of consumer data and found that a majority incorrectly believed that their information would only be shared if they explicitly gave their permission. Turow, Feldman, and Meltzer (2005) surveyed people nationwide and learned that the belief that laws prevented online and offline businesses from selling personal information was held across the U.S.

Recommended Reading

S. D. Warren and L. D. Brandeis, "The right to privacy," Harvard Law Review, vol. 4, no. 5, December 1890, pp. 193–220.

A. McDonald, and L. Cranor, "The cost of reading privacy policies," I/S: A Journal of Law and Policy for the Information Society, vol: 4, no. 3, 2008. [Online]. Available: http://moritzlaw.osu.edu/students/groups/is/files/2012/02/Cranor_Formatted_Final.pdf. [Accessed: 3 June 2015].

Additional References

D. J. Solove, "Privacy self-management and the consent dilemma," Harvard Law Review, vol: 126, 2013. [Online]. Available: http://cdn.harvardlawreview.org/wp-content/uploads/pdfs/vol126_solove.pdf. [Accessed: 13 June 2015].

C. J. Hoofnagle and J. King, What Californians Understand About Privacy Offline, Research Report, May 2008. [Online]. Available: Social Science Research Network, https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1133075. [Accessed: 3 June 2015].

J. Turow, L. Feldman, and K. Meltzer, "Open to exploitation: America's shoppers online and offline," Annenberg Public Policy Center of the University of Pennsylvania, Departmental Paper, 1 June 2005. [Online]. Available: http://repository.upenn.edu/cgi/viewcontent.cgi?article=1035&context=asc_papers. [Accessed: 3 June 2015].

Variation in Comprehension, Preferences, Concern, and Behaviors Around Privacy

Preferences and concerns regarding privacy vary between users, in ways that can be divided into broad categories or personas. Many surveys have been performed to measure users’ concern about privacy. The Westin surveys divided users into categories based on their levels of concern: high concern/fundamentalist, medium/pragmatist, and low/unconcerned (Kumaraguru & Cranor, 2005). Interested in how these concerns might vary based on the population surveyed, Schnorf, Sedley, Ortlieb, and Woodruff (2014) compared survey sample providers to show that privacy concerns varied across user populations. Considering that privacy concerns might differ not only by level of concern but by domains of concern, Malhotra, Kim, and Agarwal (2004) developed the Internet Users Information Privacy Concerns (IUIPC) model, which divides privacy concerns into the dimensions of collection, control, and awareness of privacy practices.

Research into changes in privacy concerns and behavior over time has shown conflicting results, particularly with respect to how users respond to current events. For example, the heightened public concern for privacy after Edward Snowden’s revelation of PRISM, a vast surveillance network that has been collecting data from major Internet companies since 2007, was extensively detailed in a Pew study. Madden (2014) described how a majority of those surveyed lacked confidence, with respect to all the major communication channels, that their information would remain private and secure. Conversely, Preibusch (2015) studied Internet users’ level of concern about privacy by examining browser history for keywords relating to PRISM, visits to the Microsoft privacy policy page, and estimated numbers of users of privacy-enhancing technology. He concluded that the impact of the revelation of PRISM was "limited and short-lived."

However, some users express what seems to be a total lack of concern for their privacy. Examining this "unconcerned" category more closely, Spears and Erete (2014) developed a persona model based on users’ trust in organizations that collect information, awareness of privacy issues, and understanding of how those issues apply. This model did not focus on whether or not users were concerned, but instead whether or not users should be concerned, and what information consequently needs to be presented to them. Likewise exploring the reasoning behind user behavior, Preibusch (unpublished) proposed creating privacy typologies based on different dimensions, and demonstrated the utility of his model by surveying users on how they value privacy vs. functionality. Testing one such typology, Cha (2010) found that privacy concerns interacted with perceptions of interpersonal utility in predicting levels of social- media usage. Egelman and Peer (2015) sought to predict privacy preferences in less context-specific terms and conducted experiments around personality. They found that decision-making style and risk-taking attitudes were most predictive of privacy preferences.

When users do express a preference for increased privacy, their behavior often contradicts their preferences. Spiekermann, Grossklags, and Berendt (2001) studied the relationship between users’ privacy preferences and behavior and found that a majority of users of a test shopping site gave out far more personal information than they claimed to be comfortable sharing. Similarly, Gross and Acquisiti (2005), observing a sample of Facebook users’ behavior, determined that the majority of those users disclosed large amounts of personal information and that few bothered to change their default privacy settings. While users may be concerned about violations of privacy, their actions do not seem to reflect it. To address this issue, Coventry, Jeske, and Briggs (2014) sought to develop predictive typologies that would help determine user behavior based on their reported privacy preferences.

Recommended Reading

N. K. Malhotra, S. S. Kim, and J. Agarwal, "Internet users' information privacy concerns (IUIPC): The construct, the scale, and a causal model," Information Systems Research, vol: 15, no. 4, December 2004. [Online]. Available: http://csis.pace.edu/ctappert/dps/d861-09/team2-2.pdf. [Accessed: 3 June 2015].

S. Spiekermann, J. Grossklags, and B. Berendt, "E-Privacy in 2nd generation e-commerce: Privacy Preferences versus Actual Behavior," in Proceedings of the 3rd ACM Conference on Electronic Commerce, 2001, New York, NY. Available: Wijnand IJsselsteijn, http://www.ijsselsteijn.nl/slides/Spiekermann.pdf. [Accessed: 3 June 2015].

L. Coventry, D. Jeske, and P. Briggs, "Perceptions and actions: Combining privacy and risk perceptions to better understand user behaviour," in Proceedings of the Symposium on Usable Privacy and Security, July 2014, Menlo Park, CA. Available: Carnegie Melon University, CUPS Laboratory, http://cups.cs.cmu.edu/soups/2014/workshops/privacy/s2p3.pdf. [Accessed: 3 June 2015].

Additional References

P. Kumaraguru and L. F. Cranor, Privacy Indexes: A Showcase of Westin’s Studies, Carnegie Mellon University, Technical Report, 2005. [Online]. Available: http://repository.cmu.edu/cgi/viewcontent.cgi?article=1857&context=isr. [Accessed: 3 June 2015].

S. Schnorf, A. Sedley, M. Ortlieb, and A. Woodruff, "A comparison of six sample providers regarding online privacy benchmarks," in Proceedings of the Symposium on Usable Privacy and Security: Workshop on Privacy Personas and Segmentation, July 2014, Menlo Park, CA. Available: Carnegie Melon University, CUPS Laboratory, http://cups.cs.cmu.edu/soups/2014/workshops/privacy/s4p1.pdf. [Accessed: 3 June 2015].

M. Madden, Public Perceptions of Privacy and Security in the Post-Snowden Era, Pew Research Center, Report, November 2014. [Online]. Available: http://www.pewinternet.org/2014/11/12/public-privacy-perceptions/. [Accessed: 4 June 2015].

S. Preibusch, "Privacy behaviors after Snowden," Communications of the ACM, vol. 58, no. 5, May 2015. [Online]. Available: http://cacm.acm.org/magazines/2015/5/186025-privacy-behaviors-after-snowden/fulltext. [Accessed: 4 June 2015].

J. Spears and S. Erete, "'I have nothing to hide; thus nothing to fear': Defining a framework for examining the 'nothing to hide' persona," in Proceedings of the Symposium on Usable Privacy and Security: Workshop on Privacy Personas and Segmentation, July 2014, Menlo Park, CA. Available: Carnegie Melon University, CUPS Laboratory, http://cups.cs.cmu.edu/soups/2014/workshops/privacy/s4p3.pdf. [Accessed: 3 June 2015].

S. Preibusch, "Managing diversity in privacy preferences: How to construct a privacy typology," presented at the Symposium on Usable Privacy and Security: Workshop on Privacy Personas and Segmentation, July 2014, Menlo Park, CA. Available: Sören Preibusch, http://preibusch.de/publications/Preibusch__SOUPS-2014_Privacy-Personas-Segmentation-WS_Privacy-typology-howto_DRAFT.pdf. [Accessed: 3 June 2015].

J. Cha, "Factors affecting the frequency and amount of social networking site use: Motivations, perceptions, and privacy concerns," First Monday, vol. 15, no. 12, December 2010. [Online]. Available: http://firstmonday.org/ojs/index.php/fm/article/view/2889/2685. [Accessed: 4 June 2015].

S. Egelman and E. Peer, Predicting privacy and security attitudes. SIGCAS Computers and Society, vol. 45, no. 1, February 2015. [Online]. Available: https://www.icsi.berkeley.edu/pubs/networking/predictingsecurity15.pdf. [Accessed: 13 June 2015].

R. Gross and A. Acquisti, "Information revelation and privacy in online social networks," in Proceedings of the ACM Workshop on Privacy in the Electronic Society, 2005, New York, NY. Available: http://doi.acm.org/10.1145/1102199.1102214. [Accessed: 3 June 2015].

Combining Technological, Educational, and Regulatory Solutions to Privacy Problems

Privacy decisions are not made in a vacuum, and Internet service providers need to recognize and account for actual user behavior. Accounting for how people deal with the trade-off between privacy and convenience is essential to setting up systems to respect user privacy. Acquisiti (2004) noted that patterns of user behavior around privacy were not rational; they bore more of a resemblance to those of economic models of immediate gratification. Acquisiti argued that consumer protection should be addressed through a combination of technology, regulation, and awareness.

Delving into how to improve privacy technology, Knijnenburg (2014) observed that people's privacy disclosures vary not just in degree, but in kind, and suggests user-tailored profiles. With regard to regulation, Nissembaum (2004) argued that the social norms about whether the information is appropriate to share in the first place, and whether it should then be further distributed, could be used to effectively govern privacy policies so that they were in line with user expectations.

Beresford, Kübler, and Preibusch (2012) investigated how much awareness of privacy policies affected purchasing behavior, specifically looking at how willing DVD buyers were to pay a premium for privacy. They found that not only did most buyers choose the cheaper option by offering more personal information, they bought equally from the more and less private store when there was no price difference.

However, if privacy policies are conveniently disclosed as part of the purchasing process, people will use information about privacy in decision-making. Tsai, Egelman, Cranor, and Acquisti (2011) designed an experiment in which privacy policy information was clearly and concisely displayed on a shopping search engine. They found that buyers tended to select online retailers who better protected their privacy, and some of them were even willing to pay a premium for privacy. In a subsequent experiment with iPhone applications that requested privacy permissions, Egelman, Felt, and Wagner (2012) confirmed that users would pay more for apps that protected their privacy better when they were able to compare two apps side-by-side. These results suggest that technology that includes privacy awareness as part of the process has the potential to influence decision-making. Squicciarini, Lin, Sundareswaran, and Wede (2014) developed one such system that inferred and subsequently recommended privacy preferences for images uploaded to social media sites, demonstrating the feasibility of creating tools that enhance users’ ability to manage their privacy.

Recommended Reading

A. Acquisti, "Privacy in electronic commerce and the economics of immediate gratification," in Proceedings of ACM Electronic Commerce Conference, 2004, New York, NY. Available: Carnegie Mellon University, https://www.heinz.cmu.edu/~acquisti/papers/privacy- gratification.pdf. [Accessed: 3 June 2015].

S. Egelman, A. P. Felt, and D. Wagner, "Choice architecture and smartphone privacy: There’s a price for that," in Proceedings of the Workshop on the Economics of Information Security, 25-26 June 2012, Berlin, Germany. Available: WEIS, http://weis2012.econinfosec.org/papers/Egelman_WEIS2012.pdf. [Accessed: 3 June 2015].

Additional References

B. P. Knijnenburg, "Information disclosure profiles for segmentation and recommendation," in Proceedings of the Symposium on Usable Privacy and Security: Workshop on Privacy Personas and Segmentation, July 2014, Menlo Park, CA. Available: Carnegie Mellon University, CUPS Laboratory, http://cups.cs.cmu.edu/soups/2014/workshops/privacy/s3p1.pdf. [Accessed: 3 June 2015].

H. Nissenbaum, "Privacy as contextual integrity," Washington Law Review, vol. 79, no. 1, February 2004. [Online]. Available: http://www.nyu.edu/projects/nissenbaum/papers/washingtonlawreview.pdf. [Accessed: 3 June 2015].

A. Beresford, D. Kübler, and S. Preibusch, "Unwillingness to pay for privacy: A field experiment," Economics Letters, vol 117, no. 1, pp. 25-27, October 2012. [Online]. Available: ScienceDirect, http://www.sciencedirect.com/science/article/pii/S0165176512002182. [Accessed: 3 June 2015].

J. Y. Tsai, S. Egelman, L. Cranor, and A. Acquisti, "The effect of online privacy information on purchasing behavior: An experimental study," Information Systems Research, vol. 22, no. 2, June 2011, pp. 254–268. [Online]. Available, ACM Digital Library, http://dl.acm.org/citation.cfm?id=2000438. [Accessed: 3 June 2015].

A. C. Squicciarini, D. Lin, S. Sundareswaran, and J. Wede, "Privacy policy inference of user-uploaded images on content sharing sites," IEEE Transactions on Knowledge and Data Engineering, vol: 27, no. 1, January 2015, pp. 193-206. [Online]. Available, IEEE Xplore Digital Library, https://ieeexplore.ieee.org/document/6807800?arnumber=6807800 [Accessed: 23 January 2023].