Hacker News new | ask | show | jobs
by AlanSE 1136 days ago
Funny enough, in 2023, the option most likely to be viable for the impossible task of categorizing the training set would be to use a LLM. While I don't "trust" these AIs to give accurate information, it's probably within their capabilities to categorize by the above mentioned categories... then feed that back into another (very expensive) round of training, along with some theoretical developments to boot. I do think this is within the realm of possibility in the next ~1 year, but would be hard.
1 comments

Oh, I surely think LLM's could help with the task of curation. Maybe even spot lots of potential errors and flaws by themself, to get to the worst cases in the dataset faster. But to finally confirm or negate the actual data in question, there has to be at least one (not overworked) human in the loop (and many eyes would be better). Otherwise it will just reinforce the existing flaws.