Hacker News new | ask | show | jobs
by welanes 1698 days ago
Another great resource is incolumitas.com. A list of detection methods are here: https://bot.incolumitas.com/

I run a no-code web scraper (https://simplescraper.io) and we test against these.

Having scraped million of webpages, I find dynamic CSS selectors a bigger time sink than most anti-scraping tech encountered so far (if your goal is to extract structured data).

1 comments

Can your scraper be used to scrape images? I need to scrape some books from a paywalled site and they are presented a page at a time. The JS code is too complex for me to bother trying to figure out how it creates the unique tokens it applies to every image it displays to avoid a very simple scrape.