| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kasey_junk 291 days ago

My one shot rate for unattended prompts (triggered GitHub actions) has gone from about 2 in 3 to about 4 in 5 with my upgrade to 4.5 in the codebase I program in the most (one built largely pre-ai). These are highly biased to tasks I expect ai to do well.

Since the upgrade I don’t use opus at all for planning and design tasks. Anecdotally, I get the same level of performance on those because I can choose the model and I don’t choose opus. Sonnet is dramatically cheaper.

What’s troubling is that you made a big deal about not hearing any stories of improvements as if your bar was very low for said stories, then immediately raised the bar when given them. It means that one doesn’t know what level of data you actually want.

1 comments

Retric 291 days ago

Requesting concrete examples isn’t a high bar. Autopilot got better tells me effectively nothing. Autopilot can now handle stoplights does.

link

kasey_junk 291 days ago

“ but i cannot point to a single person who seems to think that they are accomplishing real-world tasks with GPT5 better than they were with GPT4.”

I don’t use OpenAI stuff but I seem to think Claude is getting better for accomplishing the real world tasks I ask of it.

link

Retric 291 days ago

Specifics are worth talking about. I just felt it unfair to complain about raising the bar when you didn’t initially reach it.

In your own worlds: “You asked for a single anecdote of llms getting better at daily tasks.”

Which is already less specific than their request: “i'd love to hear tangible ways it's actually better.”

Saying “is getting better for accomplishing the real world tasks I ask of it” brings nothing to a discussion and was the kind of vague statement that they were initially complaining about. If LLM’s are really improving it’s not a major hurdle to say something meaningful about what specific is getting better. /tilting at windmills

link