Hacker News new | ask | show | jobs
by giobox 809 days ago
Of course we have passed it. The moment LLM training happened was when everyone started locking down access to their data or increasing costs of developer API access - twitter/x have done similar things, and quora etc.

Now the corpus of user questions/answers, posts and so on has real value as machine learning training data it’s hardly surprising this is happening - no one wants to “give away the farm” to a rival LLM product bootstrapped on data that was too easy to scrape.

For older readers who remember the buzz about web2.0 in early 2000s and everything would be a public api or feed - the recent history of the web now has almost been the opposite. Examples of this are everywhere - RSS is essentially dead, news readers died, people are trying to put podcasts behind proprietary systems (Spotify) etc etc, more and more data is hidden behind account walls, app binaries on mobile often only arrive from a mandatory store…