| HN Mirror

> it's amazing what they can do, but left to their own devices they'll make boneheaded decisions.

Yeah, the whole "can run for 9 hours on a task" to me is not a positive.

I tend to find if Opus 4.8 runs for ~15 mins on a task, then the end result has gone off in a weird direction at some point, and it needs winding back a fair bit.

And that's with extremely clear direction, literal specification docs to follow, etc.

That being said, having functional code already created beforehand (ie by a human) goes a long way to ensuring the AI model has a path it can build on without making too many dumb architectural choices by itself. Generally.