| HN Mirror

It's an improvement but...I've asked it to do some really simple tasks and it'll occasionally do them in the most roundabout way you could imagine. Like, let's source a bash file that creates and reads a state file to do something for which the functionality was already built-in. Say I'm a little skeptical of this solution and plug it into a new o1-preview prompt to double check the solution, and it starts by critiquing the bash script and error handling instead of seeing that the functionality is baked in and it's plainly documented. Other errors have been more subtle.

When it works, it's pretty good, and sometimes great. But when failure modes look like the above I'm very wary of accepting its output.