Hacker News new | ask | show | jobs
by rdedev 830 days ago
Claude 3 does use publically available data. Not everything is synthetically generated. Look at the section for training data in the below link. It has an quote from the paper which states that it uses a mix of public data, data from labelers and synthetic data

https://www.lesswrong.com/posts/JbE7KynwshwkXPJAJ/anthropic-...

I can't find a link to the actual clause paper to verify the above link but a few other places mention the same thing about the training data. We don't know if this improved performance is because of synthetic data or something else. I'm guessing even antropic might not be knowing this too.