I build many scraping based on node + puppeteer + PostgreSQL as data storage and queue.
Everything works great, but I want to develop myself, my next goal to scraping data from + 100 000 000 pages (I have my proxy server with squid). When I can find sources to learn advanced scraping data, optimize performance, etc?
I don’t know much about Node-based scraping. But I can say that the bottleneck in your approach is puppeteer because it works directly with the web browser driver, even if it is headless. It will always be slower and less scalable than making requests from a light HTTP client. In python, it is e.g. requests, aiohttp, httpx, etc.
So I would recommend that you explore approaches for scraping without a web browser. Since you are likely to have resource issues when building large scraping systems.