|
|
|
|
|
by stared
10 days ago
|
|
Nice! I remember "Baba is Eval" (https://fi-le.net/baba/), released 11 months ago, back when Claude Opus 4 was the strongest model. Back then, I was surprised how poor was it even at the first level. I am happy to see an another approach - and indeed, with much stronger results. |
|
While I did implement a more comprehensive harness with path finding tools etc. the models themselves have improved significantly.