Web Scraping: Understand How Third Parties Access Data You Expose

Web scraping is a technique that automates the collection of data on a website or web application. Although it has been a common practice since the emergence of the Internet in the 1990s, it has recently gained resonance after it was supposedly used to copy data from over 235 million Instagram, TikTok, and YouTube profiles by the Deep Social marketing agency.

The case exposed how a tool that is widely used legitimately by researchers and journalists can be exploited to violate the privacy of social network users. Understand below what scraping is, what it is for, and what are the risks involved.

What is Web Scraping?

Web scraping is an automated information collection technique, in which it is possible to obtain data publicly available on certain websites. The resource is usually used to speed up consultation and collection on a public basis.

Although the information obtained is open, collecting this data manually would not be as efficient as scraping. The technique allows the use of programming language, applications, and scripts to collect data on a large scale, simplifying the work of extraction and classification of this information.

What is Web Scraping used for?

Web scraping can be an important tool for researchers, data scientists, and journalists, among other professionals. The technique allows automating, for example, the collection of data from a public database of the Federal Government to use in a report or feed a study. Those who research in the areas of communication, and politics, among others, can also use scraping to obtain open personality data on social networks, such as Twitter.

READ: Mastodon Guide: what it is, how it works, how to choose a Server, and how to use the social network

Professionals and marketing agencies can also use the technique. In these cases, the data is usually used to segment campaigns and make certain advertising more efficient to reach the target audience.

Scraping Risks

The risk of scraping involves the fate and purpose of using the data collected. Besides legitimate professionals, the scraping technique can be used by malicious agents to enable scams, fraudulent activities, or even for hypersegmentation of advertising campaigns and policies beyond what the user has anticipated.

One of the famous cases of hypersegmentation occurred with the Cambridge Analytica scandal, in which former employees of the company claimed to have used data from Facebook profiles to create behavioral maps of American voters. U.S. parliamentarians and even a social network executive allege that the company’s actions would have influenced the outcome of the 2016 presidential election.

Is Scraping illegal?

Scraping is not necessarily considered illegal. The collection of data usually takes place with information openly available on the platforms and that, therefore, would be accessible to anyone on the network. Thus, just as a user is free to open a social network profile and write down a person’s data, it is not a crime to do the same with several pages through an automated system.

The practice, however, constitutes a violation of the terms of use of most social networks such as Facebook, Instagram, TikTok, and YouTube. They all prohibit the copying of data stored on their platforms through automated mechanisms.

What data can unknown people and companies have access to?

With the use of web scraping in a social network, it is possible to access public profile data such as profile photos, e-mails, phone numbers, age, and gender, as well as information on the number of followers and engagement in postings of a given account.

READ: What was the world's first social network? See the evolution of platforms

Also, it is possible to collect postings, shared links, and any other material open to the public, as long as the platform offers proper access. In general, this is done through API, a code that bridges the scraping software to the site from where the data will be collected. The main social networks also ask the user to decide whether or not a certain software can have access to the requested data.

How to Avoid Problems with Scraping

While most social networks may block robot data collection activity from their platforms, some bots may be able to bypass filters and have access to public user accounts.

In the Cambridge Analytica investigation, for example, it came to light that the company had seized millions of data to which it could not have had access. They included, for example, information from friends of people who had given their consent to the collection. Since then, Facebook said it had corrected the flaw and prevented the use of the same vulnerability.

Therefore, the most effective way to defend against web scraping is to keep the profile with the maximum amount of private information, configuring the privacy of publications and personal data only for followers or friends, depending on the network.

Also, social networks need to offer acceptable levels of data protection. Especially after the entry into force of the General Law of Data Protection, which is expressed in the sense of the obligation of transparency, elimination of unnecessary data, and the application of the principle of privacy by design, which values the prevention of invasion of privacy before failures occur.

This post may contain affiliate links, which means that I may receive a commission if you make a purchase using these links. As an Amazon Associate, I earn from qualifying purchases.