Hacker News new | ask | show | jobs
by tomberin 1180 days ago
:) Agree, but the scraping arms race is way beyond that, if someone doesn't want their page scraped this isn't a threat to them.
2 comments

Has it? Can you give me an example of a site that is hard to scrape by a motivated attacker?

I'm curious, because I've seen stuff like the above but of course it only fools a few off the shelf tools, it does nothing if the attacker is willing to write a few lines of node.js

Try Facebook, I've spent some time trying to make it work but figured out I can do what I need by using Bing API instead and get structured data...
i guess the lazy way to prevent this in a foolproof way is to add an ocr somewhere in the pipeline, and use actual images generated from websites. although maybe then you'll get #010101 text on a #000000 background