|
|
|
|
|
by kamikazeturtles
511 days ago
|
|
> I think because they are trained on Claude/O1, they tend to have comparable performance. Why does having comparable performance indicate having been trained on a preexisting model's output? I read a similar claim in relation to another model in the past, so I'm just curious how this works technically. |
|