Hacker News new | ask | show | jobs
by derefr 1104 days ago
A server-rendered HTML website is an API. If said HTML website is unauthenticated and freely accessible by any user-agent, then it's a de-facto public API, too. This is the case for the Apple discussion boards. You don't have to authenticate with their backend before scraping anything you'd like off of the site. You can build third-party tools to read or even interact with this site, by scraping the HTML.

This thing with Reddit is only the big deal that it is, because Reddit's backend blocks "app" user-agents from simply scraping pages from the non-authenticated Reddit HTML website. (If this wasn't true, they'd just do that, and none of this would be an issue.) But these third-party UAs are instead forced to go through the authenticated data API — where they can then be API-credit-limited and forced into paid data-API subscription plans.