Hacker News new | ask | show | jobs
by kjkjadksj 693 days ago
Seems to me eventually we might hit a point where stuff like api access is whitelisted. You will have to build a real relationship with a real human at the company to validate you aren’t a bot. This might include in person meeting as anything else could be spoofed. Back to the 1960s business world we go. Thanks, technologists, for pulling the rug under us all.
5 comments

Scraping implies not API - they're accessing the site as a user agent. And whitelisting access to the actual web pages isn't a tenable option for many websites. Humans generally hate being forced to sign up for an account before they can see this page that they found in a Google search.
Scraping often uses the same APIs that the website itself does, so to make that work a lot of sites will have to put their content around authentication of some sort.

For example, I have a project that crawls the SCP Wiki (following best practices, ratelimiting, etc). If they were to restrict the API that I use it would break the website for people, so if they do want to limit the access they have no choice but to instead put it behind some set of credentials that they could trace back to a user and eliminate the public site itself. For a lot of sites that's just not reasonable.

You can't whitelist and also have a consumer-facing service. There is no reliable way to differentiate between a legitimate user and the AI company's scraper.
Yep, it reminds me of the Ferrari almost-scam that was thwarted because the target thought to verify by asking about something that was only shared in-person.
I could definitely see this. I worked for a company that had a few popular free inspector tools on their website. The constant traffic load of bots was nuts.