Hacker News new | ask | show | jobs
by ipsum2 599 days ago
I don't know about you, but o1-preview/o1-mini has been able to solve many moderately challenging programming tasks that would've taken me 30 mins to an hour. No other models earlier could've done that.
1 comments

It's an improvement but...I've asked it to do some really simple tasks and it'll occasionally do them in the most roundabout way you could imagine. Like, let's source a bash file that creates and reads a state file to do something for which the functionality was already built-in. Say I'm a little skeptical of this solution and plug it into a new o1-preview prompt to double check the solution, and it starts by critiquing the bash script and error handling instead of seeing that the functionality is baked in and it's plainly documented. Other errors have been more subtle.

When it works, it's pretty good, and sometimes great. But when failure modes look like the above I'm very wary of accepting its output.

> I've asked it to do some really simple tasks and it'll occasionally do them in the most roundabout way you could imagine.

But it still does the tasks you asked for, so that's the part that really matters.