| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cornel_io 815 days ago
	Most human authors are frankly far too stupid to be worth reading, even if they do put care into their work. This, IMO, is the actual biggest problem with LLMs training on whatever the biggest text corpus us that's available: they don't account for the fact that not all text is equally worthy of next-token-predicting. This problem is completely solvable, almost trivially so, but I haven't seen anyone publicly describe a (scaled, in production) solution yet.

2 comments

mistermann 815 days ago

> This problem is completely solvable, almost trivially so, but I haven't seen anyone publicly describe a (scaled, in production) solution yet.

Can you explain your solution?

link

pcthrowaway 814 days ago

I imagine it looks something like "Censor all writing that contradicts my worldview"

link

Ma8ee 815 days ago

It hardly matters what sources you are using if you filter it through something that has less understanding than a two year old, if any, no matter how eloquent it can express itself.

link