| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mariojv 931 days ago
	To me, it seems like more of a competitive issue for OpenAI if part of their secret is the ability to synthesize good training data, or if they're purchasing training data from some proprietary source.

2 comments

valine 931 days ago

I suspect OpenAI’s advantage is their ability to synthesize a good fine tuning dataset. My question would be is this leaking data from the fine tuning dataset or from the initial training of the base model? The base model training data is likely nothing special.

link

bonzaidrinkingb 931 days ago

Good point. But many are already directly training on output from GPT. Probably more efficient than copying the raw training data. Especially if it relies on this non-targeted approach.

link