Hacker News new | ask | show | jobs
by Funes- 408 days ago
How ironic. "AI" feeds off structured knowledge, artistic creations and otherwise any human production to generate its output. As a consequence of its widespread adoption, people start to lean even more towards consuming rather than producing, a tendency which was already increasing before the advent of LLMs and modern machine-learning. This, in turn, leaves "AI" implementations with no new human content to feed off of. Now what? The whole process folds onto itself. Are we entering the dark ages of cultural (in the widest sense of the word) production? Not that I don't think that we're already there, in any case, but for other, somewhat related causes...
2 comments

Perhaps the next step is having the LLMs ask questions on SO when they routinely fumble particular topics. I could see a system of knowledge bounties where people are compensated for providing accurate, in-depth training data on niche topics.
LLM content is banned everywhere on Stack Overflow, in both questions and answers, by policy, since mere days after the public announcement of ChatGPT (because it was immediately causing a huge problem): https://meta.stackoverflow.com/questions/421831

Moderators (actual elected moderators, the two dozen or so that exist for ~29 million user accounts and ~24 million non-deleted questions) went on strike in mid 2023, largely because the site staff/owners interfered with their ability to remove such content (an overwhelmingly popular policy with strong community consensus): https://meta.stackoverflow.com/questions/425000 and this decision propagated across the Stack Exchange network (as most SE sites had adopted similar policies): https://meta.stackexchange.com/questions/389811/

A large fraction of the userbase is explicitly opposed to helping LLMs out in any way whatsoever. I personally have ceased contributing new question or answer content, and only edit existing posts. I contribute new content on Codidact (https://software.codidact.com/) instead (disclosure: I have recently become a moderator there).

you’re one or two additional sentences away from the plot to The Matrix
> This, in turn, leaves "AI" implementations with no new human content to feed off of. Now what?

You seem to be under the impression that AI needs more than all recorded human knowledge up until 2024 to reach the same level as an average SO contributor. It doesn't. Because none of the average SO contributors did.

It is unclear what algorithmic improvements are required to leverage the available data to get AI to AGI, but a lack of data is definitely not the bottleneck.

One could say that these AI systems aren't sharing their solutions (or questions) with other AI systems and that the world would benefit from it if they did, though. Perhaps it's a good idea to have some shared space for AI systems where they share the validated solutions they synthesized.

> You seem to be under the impression that AI needs more than all recorded human knowledge up until 2024 to reach the same level as an average SO contributor.

Replacing the average SO cobtributor isn't adequate to replace SO, and AI is able to “replace” SO effectively only since major models have gotten not only SO-as-training-data but web search (including SO) for immediate grounding.

And without SO or something like it with active human contributions it’ll have even more trouble replacing the value SO would provide for new questions and new domains where it will neither have SO traijing data nor SO query-time-search-results to use to synthesize answers.

You're not addressing my main point, which is that humans don't need anything close to the amount of relevant data available to current and near future AI to reach SO contributor level. The idea that the lack of new human synthesized Stackoverflow data must be a future bottleneck is thus nonsense.

Don't pretend that the current state of LLM training is somehow indicative of a fundamental problem for AI.