| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bko 222 days ago
	I thought the number of tokens per second doesn't matter until I used Grok Code Fast. I realized that it makes a huge difference. If it take more than 30s to run, I lose focus, and look at something else. I end up being a lot less productive. It also opens up the possibility to automate a lot more simple tasks. I would def recommend people try fast models

2 comments

manquer 222 days ago

If you are single tasking, speed matters to an extent. You need to still be able to read/skim the output and evaluate its quality.

The productive people I know use git worktrees and are multi-tasking.

The optimal workflow is when you can supply it one or more commands[1] that the model can run to validate/get feedback on its own. Think of it like RLHF for the LLM, they are getting feedback albeit not from you, which can be laborious.

As long as the model gets feedback it can run fairly autonomously with less supervision it does not have to testing driven feedback, if all it gets is you as the feedback, the bottleneck will be always be the human time to read, understand and evaluate the response not token speed.

With current leading models doing 3-4 workflows in parallel is not that hard, when fully concentrating, of course it is somewhat less when browsing HN :)

---

[1] The command could be a unit test runner, or a build/compile step, or e2e workflows like for UI it could be Chrome MCP/CDP, playwright/cypress, or storybook-js and so on. There are even converts toversion of TDD to benefit from this gain.

You could have one built for your use case if no existing ones fit, with model help of course.

link

SOLAR_FIELDS 222 days ago

Hmm. I run maybe 3 work streams max in parallel and struggle to keep up with the context switching. I have some level of skepticism that your colleagues are amazingly better and do 4 and produce quality code at a faster rate than 1 or 2 work streams in wall clock time. I consider a workstream to be disparate features or bugs that are unrelated and require attention. Running 8 agents in parallel that are all doing the same thing is of course trivial nowadays but that in of itself is what I would consider a single threaded workstream.

link

manquer 222 days ago

We have similar definition of streams, but It depends on a lot of things from your tooling/ language , stack etc.

if your builds take a fair bit of time (incremental builds may not work in worktree first time) or you are working on a item that has high latency feedback like e2e suite that runs on a actual browser etc.

Prompt styles also influences this. I like to make fairly detailed prompt that cover a lot of the nuances upfront and spend 10-15 or more writing it. I find that when I do that it takes longer, but I only give simple feedback during the run itself freeing me to go next item. Some people prefer chat style approach, you cannot keep lot of threads in mind if chatting.

Model and cli client choice matters , on average codex is slower than sonnet 4.5 . Within each family if you enable thinking or use the high reasoning model it can be slower as well.

Finally not all tasks are equal, I like to mix some complex and simpler ones or add some dev ex or a refactor that requires lower attention budget with features that require more.

Having said that, while I don’t know 10x type developers. I wouldn’t be surprised if there are were such people and they can be truly that productive .

The analogy I think of is chess. Maybe I can play 2-3 games in parallel reasonably well, but there are professional players who can play dozens of games blindfolded and win all of them.

link

SOLAR_FIELDS 221 days ago

Nice answer - all of the above aligns with my experience.

I use sonnet a lot more than openai models and its speed means I do have to babysit it more and get chattier which does make a difference, probably you are right that if I was using codex which is on average 4-6 times slower than claude code that I would have more mental bandwidth to handle more workstreams.

link

nextaccountic 221 days ago

This reads like satire. Who can work on two separate features at the same time?

link

LeafItAlone 221 days ago

I completely agree. Grok’s impressive speed is a huge improvement. Never before have I gotten the wrong answer faster than with Grok. All the other LLMs take a little longer and produce a somewhat right answer. Nobody has time to wait for that.

link