| I would love to give a quick primer on how I'm using agents: I'll usually have a main line of work I'm focused on. I'll describe the current behavior and desired changes (need to plumb this var through these functions to use here). "Gpt 5 thinking high" is pretty precise, so if you clearly indicate what you want it usually does exactly what I request. (If this isn't happening for you, make sure you don't have other context in your codebase that confuses it) While it's working, I'll often be able to prompt another line of work, usually requesting explicitly it not make changes but not switching to ask mode. It will do most of the work to figure out what changes would need to be made and it summarizes them helpfully which allows me to correct it if it's wrong. You can repeat this for as long as the existing models are busy Types of prompts that work well: Questions: "what's the function or component for doing X", where else do we do this pattern? Bug prompts (anything that would take you <2h to fix should be promptable in a single prompt, note you'll get slightly different responses even with the same prompt, so if at first you don't succeed you might explain what went wrong, ask it to improve your prompt, and then try again from scratch. People don't reset context often enough) Larger scale architecture / plans - this I would recommend switching to plan mode and spending some time going back and forth. Often it will get confused so take your progress (ideally as an .md file) and bring it to a new conversation to keep iterating. You can even have it suggest jira tickets etc Understanding different models is important: Claude 4.5 (and most Claude models since 3.5) really want to do stuff. And if you leave them unchecked they'll usually do way more than you asked. And if they perceive themselves to be blocked on a failing test they might delete it or change it to be useless. That said, they're really extraordinary models when you want a quick prototype fleshed out where you don't make all of the decisions. Gpt 5 thinking high is my personal favorite (codex 5 thinking high is also very good in the codex plugin in vscode). Create new context often. |
Best things about gpt: the precision. I don't even care that they're slow, it just let's me queue up more work
Best things about codex: it's a little smarter at handling very hard or very easy tasks. It might spend less time on easy tasks and even more time on hard ones
Best things about grok: speed plus leetcode style ability
All of them tend to benefit from a feedback loop if you can give them great tests or good static analysis etc, but they will cheat if you let them (any in ts)