Why does falsifying your data sometimes result in better privacy online?

WHYREDZero
3 min readNov 7, 2021

It’s a well-known fact that huge organizations are hungry for data collection and will go to any extent to collect data. Why do they do this? Being data-smart has been linked to a clear spike in profit for companies that collect data and use the insights to guide business decisions. It’s all about the money.

However, it can get very invasive quickly. Crucial data, like your age, shouldn’t be required to connect with your friends online. Sometimes, there’s no way around it. If you want to use the service, you have to accept the data collection.

There are privacy management tools online that help you block organizations from collecting this data. Mechanisms that block tracking are sometimes included in your browser by default. However, there is no guarantee that they will stop the privacy-invasive data collection completely.

Some organizations have started using more advanced data collection methods, like super cookies. They are much harder to block. Websites automatically use this advanced technology when third-party cookies are blocked.

Here’s what you should do instead, falsify the information that you give them. Lie. Here are two reasons why this works.

One, so-called smart data collection algorithms are only smart when analyzing the data as a whole to draw insights. They are running on a computer, and all computers are bound by the rule “garbage in, garbage out” or GIGO. The GIGO rule means that if the input information is incorrect, the output is incorrect. A second inherent conclusion of this rule is that computers cannot differentiate when you feed them the wrong input, given that both the requested and the correct input belong to the same data class. You can smartly falsify your data, and the computer won’t know the difference.

Second, if a human ever goes over the data and realizes that a garbage data point has been input into the database, they will tend to ignore it. Data analysts follow the rule of collecting large enough sample sizes so that errors in collection or garbage data do not affect the final insight, leading to better profit for the organization. Falsified data points will usually be ignored as garbage, or worse, be included in the resulting insight causing losses to the organization.

Now comes the question of what to falsify. Before we answer this question, we need to understand the distinction between structured and unstructured data. Structured data is a collection of facts organized into rows and columns easily. It is easy to store, search through, and analyze. Unstructured data is everything else. Anything that you cannot organize in a clear format is unstructured.

When you’re interacting with a website, you generate a lot of data. If you click something, that’s a data point. If you upload something, that’s a data point. If you do something, that’s a data point. This data is generally unstructured. But data about data, or metadata, is structured. The number of times you clicked on something, how many bytes worth of content you download, what exact action you perform, at what time, the result of the action can all be easily organized in a tabular format and is structured data.

Data analysis is easier to conduct on structured data. To answer the question of what to falsify, falsify the structured data as much as possible. Click around randomly for no reason, change the coordinates of geotagged photos to random but valid locations, et cetera. Change the metadata to random but valid values, if you can.

Know that it’s not illegal to trick someone who’s trying to trick you into giving up your privacy. It’s just a smart move, and privacy-invasive organizations can do (mostly) nothing about it.

--

--