Chad Mills

Tech Companies Collecting Personal Data Keeps You Safe

Big tech companies like Google, Facebook, Amazon, and others all have liberal data collection practices. Most of the popular discussions of this are major concerns about privacy and concern about these companies for using personal data for ad targeting.

I want to talk about how collecting personal data is critical for keeping you safe online. That said, since the main concern people have is around personal data for ads, I’ll first address this more common objection to using personal data.

Common Concern: Personal Data for Ads

This is not an unreasonable concern. When you choose what services you use, it’s very reasonable to care about how the private information you provide will be secured and used.

I personally prefer having relevant ads targeted at me rather than seeing the things that work best on generic audiences. Most people don’t see these anymore, but recall what a generically-targeted banner ad on Yahoo would look like:

Meanwhile, here’s a targeted ad that was the first one I saw in my Facebook feed this morning:

The Facebook ad might not seem interesting to you, but as someone who enjoys photography, reads news stories relating to astronomy, and occasionally clicks into posts about astrophotography, it’s something I find pleasing to look at even if I wouldn’t purchase this particular product.

On Facebook, most ads I see—like this one—aren’t ones I act on. Nonetheless, even when I’m not buying something, most seem relevant and I still find myself saving some to remind myself of interesting products or services I didn’t realize existed before.

Beyond that, though, from Facebook ads I’ve bought everything from interesting lectures on business and online photography classes to more mundane things like Star Wars shirts. More importantly, I’ve gotten ideas for great developmental toys for my daughter that I wasn’t aware existed before.

I’ve also occasionally found them creepy. I can understand why some people are bothered enough to not want to use the product at all. They may even prefer to pay for the service instead of seeing ads—an option I think all the big tech companies should provide for privacy-conscious users.

Even though this use of personal data for ads is what gets all the attention, data is the most valuable resource in the information age and so it is important to understand the data is used for more than just advertising.

Online Abuse

Free services on the scale we have them today are a new phenomenon. When you use Google or Facebook, computers in their data centers are consuming power and performing work on your behalf.

An online storage account like Dropbox or an email account on Gmail may seem perfectly innocuous. Nonetheless, abusers can automate the creation massive numbers of accounts.

They can use many dropbox accounts to store backups of people’s computers, a service which they can charge money for without bearing most of the costs.

They can sign up for millions of fake email accounts and launch massive spam campaigns—without having to pay for mail servers or build up a good reputation as a mail sender, which is important to getting email past spam filters.

These services are free, and if there’s some value which users get for free, there will almost inevitably be some way for bad guys to use a bunch of fake accounts to fulfill a critical expense in a business they can run.

Personal Data Keeps You Safe

Knowing that a user is a real person, and not part of a massive attack, is a key enabler for these services to separate out and take action against the malicious users.

It helps the companies be more aggressive at stopping potential abuse since known legitimate users can be excluded from such filtering. And, perhaps more importantly, for a typical user it also makes the experience better.

When you sign up for a free service, there may be limits to activity. For example, after creating a new email account you only be able to send a handful of emails each day.

These limits are typically set in a way that most legitimate new users don’t get limited, but new accounts created for abusive purposes do. So, having information that indicates you’re a real person can result in these limits being raised or removed.

There are many types of personal information that can help with this. Proving to Google that you have a cell phone number that can accept text messages is one strong sign you’re a real person; bad guys can still buy phone service for each fake account they sign-up, but this is very costly and time-consuming.

Patterns of activity, including the fact that you’ve had mutual interactions with other legitimate users, can also help establish that you’re a legitimate user.

For example, if you’ve sent emails to a bunch of other users, most of which are known to be legitimate, and you typically get replies, that’s a strong sign you’re also a real person.

Fake accounts can send one another email to appear like reasonable users, so it takes more than just activity. It’s mutual interactions with other known legitimate users that really matters.

Just as artificial intelligence algorithms are used to show you relevant content in your News Feed or to provide great recommendations on Netflix, they are also used to identify malicious activity and keep you safe.

Privacy Policies

Google’s Privacy Policy includes this short explanation of how they use your data for security purposes:

We use automated systems that analyze your content to provide you with things like customized search results, personalized ads, or other features tailored to how you use our services. And we analyze your content to help us detect abuse such as spam, malware, and illegal content. We also use algorithms to recognize patterns in data.

– Google’s Privacy Policy

These policies are typically not very detailed. They are written to both give clear examples of how the data collected is critical to providing essential functionality of the service. For detecting abuse, the authors of these policies have to be careful not to give away information about the defense mechanisms that could help the bad guys work around the systems.

So, you won’t hear this discussed in much detail from the big tech companies. Nonetheless, your data is not just used for ad targeting but also for other purposes critical to running a viable service.

Even if it were possible to change the business model to where personal data wouldn’t get used for advertising, much of this same data would still get collected for other essential purposes like keeping the services secure and keeping your inbox free from spam.

About author View all posts Author website


Chad currently leads applied research, ML engineering, and computational linguistics teams at Grammarly.

He's previously led ML and data science teams at companies large and small, including working on News Feed at Facebook and on Windows and Outlook at Microsoft.

1 CommentLeave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *