Hacker News new | ask | show | jobs
by kamikazeturtles 511 days ago
> I think because they are trained on Claude/O1, they tend to have comparable performance.

Why does having comparable performance indicate having been trained on a preexisting model's output?

I read a similar claim in relation to another model in the past, so I'm just curious how this works technically.

1 comments

because the valley is burning money and GPUs training these and somebody else comes out with another model for a tiny fraction of cost it's an easy assumption to make it was trained on synthetic data