Hacker News new | ask | show | jobs
by mariojv 931 days ago
To me, it seems like more of a competitive issue for OpenAI if part of their secret is the ability to synthesize good training data, or if they're purchasing training data from some proprietary source.
2 comments

I suspect OpenAI’s advantage is their ability to synthesize a good fine tuning dataset. My question would be is this leaking data from the fine tuning dataset or from the initial training of the base model? The base model training data is likely nothing special.
Good point. But many are already directly training on output from GPT. Probably more efficient than copying the raw training data. Especially if it relies on this non-targeted approach.