|
|
|
|
|
by ofirpress
177 days ago
|
|
> There are certain tasks, like improving a given program for speed, for instance, where in theory the model can continue to make progress with a very clear reward signal for a very long time. Yup, this will absolutely be a big driver of gains in AI for coding in the near future. We actually built a benchmark based on this exact principle: https://algotune.io/ |
|