Hacker News new | ask | show | jobs
by rootw0rm 1041 days ago
There's never going to be a perfect solution, it's an arms race. I really doubt (hope?) that large entities are going to straight up emulate end-user browsers though.

I would think filtering based on user agent will be the sweet spot for effort and performance. You could do some awful JavaScript monstrosity to detect the tiny fraction of bots who are sneaky, but if they're determined to be sneaky they will succeed at scraping.

2 comments

User agent matching isn't good enough. The stakes are high -- all it takes is one AI crawler to grab my site data, and that data is included in the training forever more.

> if they're determined to be sneaky they will succeed at scraping.

Yes, which is why I suspect I will never be able to open my websites up to the general public again. I live in hope anyway.

Browsers aren't really trusted platforms, the cool scraping is in emulating phones. Whether that be in actually running a virtual phone or sending traffic that emulates it

Really just encourages phones to be even more locked down