Hacker News new | ask | show | jobs
by tlrobinson 3077 days ago
All of which is defeated by OCR.
2 comments

Good point. OCR powered web scraping is even available out of the box nowadays.

https://a9t9.com/kantu/docs/scraping#ocr

It is not the OCR that is costly. It is the JavaScript execution to render the page so you can do the OCR. You can even increase the JavaScript execution cost if suspicious.

You will also have to automate all page variations and the traditional challenges (login, captcha, user behavior fingerprinting, ...)

At the end the development time, cost and server cost will kick you out of business if you are too dependent on the information or you start to loose money every time you scrap.

Yes. The idea here is to make you dependent on OCR (you also have to find where is the information as the page design changes) and to waste a lot of your server resources making it very costly to scrape.