|
|
|
|
|
by hmottestad
291 days ago
|
|
Been playing with Codex CLI the past week and it really loves to create a fix for a bug by adding a special case for just that bug in the code. It couldn't see the patterns unless I pointed them out and asked it to create new abstractions. It would just keep adding what it called "heuristics", which were just if statements that tested for a specific condition that arose during the bug. I could write 10 tests for a specific type of bug, and it would happily fix all of them. When I add another one test with the same kind of bug it obviously fails, because the fix that Codex came up with was a bunch of if statements that matched the first 10 tests. |
|
I am convinced this behaviour and the one you described are due to optimising for swe benchmarks that reward 1-shotting fixes without regard to quality. Writing code like this makes complete sense in that context.