Hacker News new | ask | show | jobs
by koolala 664 days ago
They don't make the source data accessible :(
1 comments

> they don't make the source data accessible

No. But you haven’t articulated why making everyone’s Facebook chats public is a net good. What does opening that data up confer in practical benefits?

Given what we know about LLMs, one trained only on public-domain data will underperform one trained on that plus proprietary data. If you want source data available, you have to either concede the "open" models will be structurally handicapped or that all data must be public.

You think Llama is trained on peoples private messages? :( That isn't good...
> You think Llama is trained on peoples private messages?

Facebook says no, at least for Llama [1].

[1] https://itlogs.com/facebook-uses-user-data-to-train-ai-but-l...