|
|
|
|
|
by tobr
97 days ago
|
|
I wonder why they fail this specific way. If you just let them do stuff everything quickly turns spaghetti. They seem to overlook obvious opportunities to simplify things or see a pattern and follow through. The default seems to be to add more, rather than rework or adjust what’s already in place. |
|
I wonder if these RL runs can extend over multiple sequential evaluations, where poor design in an early task hampers performance later on, as measured by amount of tokens required to add new functionality without breaking existing functionality.