Hacker News new | ask | show | jobs
by throwaway4aday 933 days ago
Cleaning and preparing the dataset is a huge part of training. Like the OP mentioned, OpenAI likely have some high quality automation for doing this and that's what's given them a leg up above all other competitors. You can apply the same automation to clear out low quality AI content the same way you remove low quality human content. It's not about the source, just the quality matters.