Question on Web Scraping

Hello everyone,
I have taken the web scraping mission and it uses the request library, which I think, cannot scrape websites built with tons of Javascript.

I have been working on a personal project that scrapes the website for tables. So far I have been partially successful because I am using the Selenium library, which automates the manual task of getting data from a website. However, this library is not very robust. Most of the time whenever I store an element from a web page into a variable, the element becomes inaccessible later (although the web page still has it). I think this is because the web site is built with dynamically interactive elements. The website heavily uses Javascript: https://cloud.samsara.com/signin
Even the login page has the email field visible and only after we enter that email the password field becomes visible. Is there a better way to scrape a website heavily based on Javascript besides Selenium?

Not that I know of. Selenium seems to be the best thing out there.

Some websites have hidden/unofficial API endpoints, have you checked if that is the case?

2 Likes

Hi Slavina. That is interesting. I will have to check if they have any kind of APIs. Can you explain how I can find hidden APIs?
Thanks!

Depends on how the web page controller transfer message by what protocol - GET, POST, JSON, etc. You can figure out how use the JSON api for example to retrieve data.

Here’s an example on JSON protocol - https://ianlondon.github.io/blog/web-scraping-discovering-hidden-apis/

2 Likes