
In one form or another, web scraping has existed ever since the internet became available to the average person. And while there were always some questions about its legality, they’re becoming more frequent as the amount of data collected through web scraping continues to grow.
Let’s take a look at specific use cases that help us understand what’s legal in the world of web scraping.
Disclaimer: this article is purely informational, the information provided below is not legal advice and shouldn’t be taken as such.
Is Web Scraping Legal?
The short answer is yes — web scraping is legal. That said, how the data is collected and used typically defines the legality of each specific case. For example, scraping and profiting from personal information, copyrighted content, or data protected by a website’s Terms of Service is usually considered illegal.
In simpler terms, web scraping is legal as long as no applicable laws are broken. These laws may differ depending on the location of the website that is scraped, the location of the website, app, or service that uses said data, as well as what kind of data it collects.
What is Ethical Web Scraping?
To better understand what legal and ethical web scraping looks like, let’s take a look at the practices that distinguish ethical scrapers from ones that may not be operating legally.
- Only collects publicly available information — ethical scraping is always limited to publicly available data that isn’t hidden behind any paywalls.
- Doesn’t log into any accounts — by logging into an account on any website, you agree to follow its Terms of Service. Many such terms prohibit the use of web scrapers. Therefore, ethical scrapers only collect information that’s available without an account.
- Doesn’t scrape personally identifiable information — many data laws protect PII (Personally Identifiable Information), so even if there are regions where someone might get away with scraping and using such information, no ethical scraper would collect such data.
- Doesn’t overload servers — some scrapers may put a large load on the servers they’re scraping. Ethical scraping doesn’t hinder the performance of any app or website.
- Using APIs when possible — in some cases, websites provide APIs (Application Programming Interfaces) that include the data they consider to be public. This allows anyone to download APIs instead of scraping data and eliminates any risk of breaking the law or any Terms of Service.
The practices above ensure that ethical scrapers only collect information to make data-driven decisions. Such web scraping is legal and rarely causes any issues or concerns. Legal issues typically arise only when one or more of the practices above aren’t followed.
Why does Web Scraping get a bad rap?
The main reason web scraping can be viewed as shady or illegal is that not everyone scrapes the web ethically. The internet has its fair share of hackers and other malicious actors engaging in illegal activities. Unfortunately, they draw negative attention to otherwise perfectly legal activities such as web scraping.
What is Ethical Web Scraping used for?
All web scraping involves extracting large amounts of data from the web. What differentiates legal practices from the not-so-legal ones is mostly what the data is used for.
Ethical web scraping allows businesses and users to conduct research based on large amounts of information and make optimal, data-driven decisions. Some of the most common uses for ethical web scraping are price monitoring and comparison, academic research, ad verification, and data analysis.
Of course, you must also use legal, ethical tools for web scraping. If you’re using a service, make sure it follows all of the best practices. And if you’re planning to set up a web scraper of your own, it’s best to use ethically sourced residential proxies for doing so.
Privacy laws that regulate Web Scraping
There are no laws aimed directly at web scraping, but that doesn’t mean that users aren’t protected from unethical practices. The legality of web scraping is typically defined by data protection and privacy laws. Here are some of such laws you’re most likely to encounter:
GDPR
GDPR (General Data Protection Regulation) is an EU law passed in 2018 that was put into effect on the 25th of May, 2018. It’s been called the strictest data privacy and security law in the world and aims to give EU citizens full control of their personally identifiable information.
It’s important to know that GDPR applies to all EU citizens or residents. So even if a web scraper is operating outside of the EU, it still can’t legally collect any PII about EU citizens. This law is arguably the main reason why ethical scrapers avoid collecting any PII, even from websites operating outside of the EU.
COPPA
COPPA (Children’s Online Privacy Protection Act) is a US federal law that states that all online services, websites, apps, and devices must protect all personal data of children under the age of 13. This law applies to all services based in the US, or services with users based in the US.
This means that scraping and using any data about children under the age of 13 may be deemed illegal. Once again, the best way to avoid breaking this law is to avoid collecting any personal information about users whatsoever.
CFAA
CFAA (Computer Fraud and Abuse Act) is a law that has played a role in some web scraping-related lawsuits in the past. This US law was established way back in 1986 before most modern digital practices were even a thing. However, it clearly states that accessing a computer system without authorization is illegal.
It’s been since established that scraping publicly available information doesn’t require authorization. That said, scrapers need to be careful about bypassing any paywalls or agreeing to Terms of Service that would make some information private. If anyone scrapes private data, it can be taken as a CFAA violation.
CCPA
CCPA (California Consumer Privacy Act) is a US law quite similar to GDPR. It provides users the right to know about the personal information a business collects about them and how said data is used. Users can also request for any collected data to be deleted and can opt out of any sales of their personal data.
If a scraper were to collect data of users who have opted out of sharing their personal information, it could be considered a CCPA violation.
Web Scraping Legal Cases
With a variety of laws protecting personal data, there is no surprise that there have been quite a few legal cases related to web scraping over the years. Let’s take a look at some of the biggest cases that took place in recent years.
X vs. Bright Data
In 2023, X, formerly known as Twitter, sued Bright Data for scraping data from its website and potentially benefiting from it. X stated that, by doing so, Bright Data has violated its Terms of Service and copyright laws.
Interestingly enough, Meta had a similar lawsuit against Bright Data. However, it lost the legal battle against the data collection company and dropped the lawsuit.
X’s case wasn’t much different. On May 10, 2024, a federal court in California ruled in favor of Bright Data, stating that all information collected by Bright Data was publicly available at the time of collection and visible to everyone without a login.
Ryanair vs. PR Aviation
In the 2015 case of Ryanair vs. PR Aviation, Ryanair claimed that PR Aviation broke its ToU (Terms of Use) by scraping data on the website. However, the court decided in favor of PR Aviation.
Ryanair’s ToU does state that extracting data from the website for commercial purposes is prohibited. However, the court ruled that no legal contracts between Ryanair and PR Aviation were broken. That’s because the scraped information was publicly available without agreeing to the ToU.
LinkedIn vs. hiQ Labs
The case of LinkedIn vs. hiQ Labs, a now-defunct data science company, provides a lot of insight into the legality of web scraping. Initially, LinkedIn allowed hiQ to collect data but had a change of heart when they launched a similar tool of their own that allowed LinkedIn to replace hiQ Labs.
That said, LinkedIn’s cease and desist letter to hiQ was withdrawn after a court decision decided it wasn’t lawful. LinkedIn appealed the decision, claiming that hiQ had breached CFAA when scraping data, but the court ruled in favor of hiQ once more, based on the fact that hiQ only scraped public data.
But that wasn’t the end of the saga just yet. In 2022, the court ruled that hiQ’s use of fake accounts had breached LinkedIn’s Terms of Service. This led to a settlement in the case, which also required hiQ Labs to stop any form of web scraping from LinkedIn.
This lawsuit clearly shows us that web scraping is considered legal as long as only public data is scraped. By using accounts, hiQ Labs agreed to LinkedIn’s ToS and gained access to data that was behind a password wall. By doing so, they essentially agreed not to scrape data, something that may not have been the case if they hadn’t used accounts.
Conclusion
Looking at the laws and real-life examples of legal cases, one thing is clear. Web scraping is legal, as long as it’s done ethically. It’s a useful tool that allows businesses to make data-driven decisions and can be useful to everyone involved.
Ethical web scraping excludes personally identifiable information, only collects public data, and doesn’t impact the performance of the websites it scrapes. And just because cybercriminals don’t follow such practices, it doesn’t make web scraping itself illegal.