Hacker News new | ask | show | jobs
by carsoon 207 days ago
We need a library of Alexandria for primary sources. If we had source transparency then referencing back to original sources would be more clear. We could do cool things like these vintage models to reduce bias from current events. Also books in every language and books for teaching each language would help with multimodality. Copyright makes it difficult to achieve the best results for LLM creation and usage though.
2 comments

Ironically enough, that would be practical for "vintage LLM" - perhaps (morally) obligatory?
As if the language models currently would give a damn about copyright...
The problem is they have to hide their sources due to copyright. So they train on copyright data but must obscure it in the output. Thus they must hide the sources of truth making it impossible to fact check them directly and the reason that hallucinations are so common and unavoidable in the current pattern.