Hacker News new | ask | show | jobs
by wrsh07 252 days ago
I would love to give a quick primer on how I'm using agents:

I'll usually have a main line of work I'm focused on. I'll describe the current behavior and desired changes (need to plumb this var through these functions to use here). "Gpt 5 thinking high" is pretty precise, so if you clearly indicate what you want it usually does exactly what I request. (If this isn't happening for you, make sure you don't have other context in your codebase that confuses it)

While it's working, I'll often be able to prompt another line of work, usually requesting explicitly it not make changes but not switching to ask mode. It will do most of the work to figure out what changes would need to be made and it summarizes them helpfully which allows me to correct it if it's wrong. You can repeat this for as long as the existing models are busy

Types of prompts that work well:

Questions: "what's the function or component for doing X", where else do we do this pattern?

Bug prompts (anything that would take you <2h to fix should be promptable in a single prompt, note you'll get slightly different responses even with the same prompt, so if at first you don't succeed you might explain what went wrong, ask it to improve your prompt, and then try again from scratch. People don't reset context often enough)

Larger scale architecture / plans - this I would recommend switching to plan mode and spending some time going back and forth. Often it will get confused so take your progress (ideally as an .md file) and bring it to a new conversation to keep iterating.

You can even have it suggest jira tickets etc

Understanding different models is important: Claude 4.5 (and most Claude models since 3.5) really want to do stuff. And if you leave them unchecked they'll usually do way more than you asked. And if they perceive themselves to be blocked on a failing test they might delete it or change it to be useless. That said, they're really extraordinary models when you want a quick prototype fleshed out where you don't make all of the decisions. Gpt 5 thinking high is my personal favorite (codex 5 thinking high is also very good in the codex plugin in vscode). Create new context often.

2 comments

Best things about Claude: it will often figure out a good feedback loop where it can build + test and get quick feedback about whether the thing is working. This works best in Claude code but can be effective in cursor too

Best things about gpt: the precision. I don't even care that they're slow, it just let's me queue up more work

Best things about codex: it's a little smarter at handling very hard or very easy tasks. It might spend less time on easy tasks and even more time on hard ones

Best things about grok: speed plus leetcode style ability

All of them tend to benefit from a feedback loop if you can give them great tests or good static analysis etc, but they will cheat if you let them (any in ts)

I've used this analogy many times:

Codex + GPT-5-high is an offshore consultant. You give it the spec and it'll do the work and come back with something.

Claude is built like a pair programmer, it chats while it works and you can easily interrupt it without breaking the flow.

Codex is clearly more thorough, it's _excellent_ at picking apart Sonnet 4.5 code and finding the subtle gotchas it leaves behind when it just plows to a result.

And like you said, Claude is results first. It'll get where you want it to go, even if it has to mock the whole application to get the tests to pass. =)