|
|
|
|
|
by hooloovoo_zoo
78 days ago
|
|
One sentence summary: We fine-tuned a general-purpose model to produce valid benchmark code results and it got better at producing benchmark code results; we didn't bother to evaluate it on anything the model used to be good at. |
|
So no, they are not fine-tuning a general purpose model to produce "valid benchmark code results."