Data Sharing & Data Mining
Table of Contents
- What Is Data Mining?
- Data Sharing and the Internet Business Model
- Your Data Is a Valuable Commodity
- Build Your Toolkit
What Is Data Mining?
In recent news, you've probably heard something about Big Data and data mining. Data mining means searching for patterns and relationships in large collections of data—often called Big Data. Data has always been analyzed for patterns, but computers have made collecting, storing, and analyzing data far more efficient and commonplace. Through the process of automatic inference, these patterns are used to find trends, draw conclusions, and sometimes even predict future patterns.
Data mining is often used in ways that benefit everyone. In healthcare, it helps identify disease outbreaks. For example, in Rwanda, a Harvard researcher used data mining to track people’s patterns of movement and compare them to health statistics. He discovered that movement patterns changed two weeks before a cholera outbreak, and as a result, he was able to infer when an outbreak would occur. Data mining is also used by governments to optimize traffic flow, by businesses to analyze customer buying patterns, and by law enforcement agencies to solve crimes.
Although linking data up with other data increases its value, it raises concerns for privacy. Health care institutions, civic agencies, and other organizations that only want to study patterns often take steps to anonymize data. At a minimum, they will usually strip away personally identifiable information (PII) like names and social security numbers that can be used to uniquely pick out an individual. However, even characteristics that aren't distinctive by themselves, like age or marital status, can be put together with other PII to pick out the person with that unique combination of characteristics. In some cases, agencies may use more elaborate computational methods to obscure identity. Such strategies often rely on assigning someone to a general category that covers more people, such as saying that someone is in the age range 20-29 rather than saying they are 27. However, even these techniques aren't foolproof, and many agencies do not use them at all.
Data Sharing and the Internet Business Model
Sharing your information is a given when you do business or use a service. Businesses mine this data to figure out who their customers are and what they're buying—and they also frequently share customer data with third-party advertisers or sell it to data brokers. For companies that provide free online services, selling ad space and user data is how they make most of their money.
Your Data Is a Valuable Commodity
Data aggregators, also known as data brokers, specialize in learning everything they can about consumers. There are few limits on what data brokers can do with this this information, though they don't generally make all of it publicly available for free. Data brokers combine information from companies you do business with, online tracking data, and public records from government websites to make a consumer profile with a unique ID, which may in turn be sold to an advertising company. These profiles may or may not be linked with your name—but even an "anonymous" profile compiled by a data broker probably has enough information to uniquely identify you.
The more advertisers know about you, the better they can predict your future behavior. In one notorious example, Target inferred through data mining that women who bought non-scented lotion were more likely to be pregnant, and started sending them ads for baby products. In particular, data brokers and online advertising services have made a multi-million dollar business of tracking every time you buy something or go online. They use online tracking to gather data from social media sites, shopping sites, and news and entertainments sites, and then mine that data for patterns that indicate your interests. Advertisers can then display ads based on your interests, as indicated by your online (and offline) behavior.