|
|
|
|
|
by lxgr
810 days ago
|
|
Yes, this is incredibly frustrating. I wonder if we've surpassed "peak publicly searchable discussion". It definitely seems harder to find quick answers to obscure topics than it used to be 2-3 years ago. LLMs will gladly hallucinate something, but given that this stuff is literally the training data that could help ground them in truth, I wonder where we're going to go next. |
|
Now the corpus of user questions/answers, posts and so on has real value as machine learning training data it’s hardly surprising this is happening - no one wants to “give away the farm” to a rival LLM product bootstrapped on data that was too easy to scrape.
For older readers who remember the buzz about web2.0 in early 2000s and everything would be a public api or feed - the recent history of the web now has almost been the opposite. Examples of this are everywhere - RSS is essentially dead, news readers died, people are trying to put podcasts behind proprietary systems (Spotify) etc etc, more and more data is hidden behind account walls, app binaries on mobile often only arrive from a mandatory store…