Hacker News new | ask | show | jobs
by benpopper1 842 days ago
One comment to add here - regardless of where you stand on this particular LLM provider:

Do we want knowledge communities like Stack Overflow or Reddit to continue to exist? Should big AI providers that train on their data share some of the value back to the community? Is there an ethical way for web communities to license data to AI providers?

I hope the answer is yes and that there is a path to a productive partnership, one that allows public communities where knowledge is shared freely to thrive, while also bringing more grounded and vetted content to AI systems that are often closed and require a subscription to access.

4 comments

> Should big AI providers that train on their data share some of the value back to the community?

How would they do that? So far, the LLMs can't be trusted to produce accurate answers. The AI companies can pay money to the data sources, but they can't really offer back anything useful (yet, imho).

For free integrate back into stack overflow. there are tons of questions that never get answered. This also provides a public forum for that response to be corrected and provided feedback. symbiosis.
By definition, aren't those difficult questions to answer? Is there any reason to think the LLMs would succeed where humans have failed? I mean, I'm sure they would produce some output...but is a misleadingly-incorrect answer better than no answer to a thorny obscure question?
Well the least they could do imo would be to post or comment that a question is a duplicate of another or link to the top voted answer. Similar to how users on HN post links to "on going discussion threads" for duped posts. Its grunt work that these bots should at least be able to regurgitate or find easily.

Also theres a chance these LLMs have access to other tech forums in addition to stack overflow and could possibly provide a solution. For example GitHub has actually been the better source for me when debugging issues. Usually you can go to the repo and search the issues and read comments with solutions or workarounds.

But aside from that i am in agreeance with you that these bots will struggle to provide new, non regurgitated answers and could potentially cause more harm than good

Reddit literally licenses its data to AI training [1]. If doing so kills its own product that would be hilarious.

[1]: https://arstechnica.com/ai/2024/02/reddit-has-already-booked...

Reddit and stackoverflow have heavily degraded over the years long before ChatGPT existed, we should remember that.

They offered a convenience by burning money and the mismanagement and pre IPO shenanigans certainly are not helping.

They don"t own the content and the communities or the user base that can move from irc, aol, to discord. What did they learn from dead communities of the past? Do they sell traction and convenience that they own or are they claiming they sell content which they don"t own? Users curated the content, and most effective mods have left. The content in large parts of their sites has become stale or degraded long before OpenAI existed. Graveyard communities cannot curate nor pay for server costs.

AI is convenient curation and people are paying for the convenience. AI sites are also losing money per click with server costs that are out of the galaxy compared to the cached html and elastcsearch serving crowd.

We have seen the ridiculousness of the AI sites' attempts to introduce management features, short of putting penguins in the desert for animal diversity. But the great teachers of bad management features were reddit and stackoverflow who also actively killed community developed management modules.

They are failing because they lack basic understanding of the teachings of centuries of civil society and they make up what is right, wrong or politically correct ad hoc based on marketing. Just trying to avoid bad publicity that could scare off potential IPO crowd only introduces community debt and grievance. That is what has been killing them.

Wikipedia has not been crying foul but has been curating the most quality content for AI but on a low cost setup for its size. I just think its better to donate there content and money,

>> Do we want knowledge communities like Stack Overflow or Reddit to continue to exist?

No, we want better knowledge communities to exist and for Stack Overflow and Reddit to cease to exist.