|
|
|
|
|
by s17n
498 days ago
|
|
> This is obviously extremely silly, because that's exactly how OpenAI got all of its training data in the first place - by scraping other peoples' data off the internet. OpenAI has also invested heavily in human annotation and RLHF. If all DeepSeek wanted was a proxy for scraped training data, they'd probably just scrape it themselves. Using existing RLHF'd models as replacement for expensive humans in the training loop is the real game changer for anyone trying to replicate these results. |
|
That's like the mafia complaining that they worked so hard to steal those barrels of beer that someone made off with in the middle of the night and really that's not fair and won't someone do something about it?