| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by AlanSE 1184 days ago
	Funny enough, in 2023, the option most likely to be viable for the impossible task of categorizing the training set would be to use a LLM. While I don't "trust" these AIs to give accurate information, it's probably within their capabilities to categorize by the above mentioned categories... then feed that back into another (very expensive) round of training, along with some theoretical developments to boot. I do think this is within the realm of possibility in the next ~1 year, but would be hard.

1 comments

hutzlibu 1184 days ago

Oh, I surely think LLM's could help with the task of curation. Maybe even spot lots of potential errors and flaws by themself, to get to the worst cases in the dataset faster. But to finally confirm or negate the actual data in question, there has to be at least one (not overworked) human in the loop (and many eyes would be better). Otherwise it will just reinforce the existing flaws.

link