Guidelines for web scraping

Best regards

Good morning to everyone, since I have discovered the web srapping I see many things (I imagine like all of you) in a way of seeing new open doors.

But I also know that the thing is regulated, and that at least in Europe there are rules and as I have not having studied laws it escape me.

I found this:

WPC ESSnet Webscraping policy > draft < :woozy_face:

Anyone know if there is a site where you have an orientation guide in relation to safe web scrapping?

According to Europe we are datadriven but… being a novice as I am it is very likely that what I can think of is totally illegal…

I leave another link here:

Summarizing. It would be interesting to know what the clear limits are in case you start playing with something so interesting and dangerous :innocent: .


1 Like

It’s a scary topic, isn’t it? :smile:

According to some of my friends, it depends on the data and how you use it. Web scraping should be fine when the data is public (=they made it public so people can use them according to their needs), but scraping facebook messages and revealing personal information is definitely not.
I think the line between legal and illegal scraping is quite blurry, so I’d stay on the safe side :smiley:

One thing you need to be careful about is not to overwhelm the website you are scraping, so slow down the scraper (sleep() function in Python)

I’m sorry that I couldn’t serve with any official information. If you get any new info I’d be happy if you could share it with us, I’m quite curious as well.


Hello @evelin.kanda

The truth is that it is a super tempting technology, for example a car insurance comparator, they use different databases to get a result. I don’t not know to what extent they commit or not commit a crime.

if I wanted to compare for example a clothing store in the same way. It would be a problem or the problem comes from when you make a profit?

Another thing that occurs to me is when you obtain information through the manipulation of the data and use it (of course no data people), we know that it is easy to obtain information and that it is not easy to deduce where you have taken it.

it’s a very interesting field :nerd_face: