Hacker News new | ask | show | jobs
by Scrounger 314 days ago
> I remember reading that llm’s have consumed the internet text data

Not just the internet text data, but most major LLM models have been trained on millions of pirated books via Libgen:

https://techcrunch.com/2025/01/09/mark-zuckerberg-gave-metas...