| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by 0vermorrow 331 days ago

I'm eagerly awaiting for Qwen 3 coder being available on Cerebras.

I run plenty of agent loops and the speed makes a somewhat interesting difference in time "compression". Having a Claude 4 Sonnet-level model running at 1000-1500 tok/s would be extremely impressive.

To FEEL THE SPEED, you can either try it yourself on Cerebras Inference page, through their API, or for example on Mistral / Le Chat with their "Flash Answers" (powered by Cerebras). Iterating on code with 1000 tok/s makes it feel even more magical.

4 comments

scosman 331 days ago

Exactly. I can see my efficiency going up a ton with this kind of speed. Every time I'm waiting for agents my mind looses some focus and context. Running parallel agents gets more speed but at the cost of focus. Near instant iteration loops in Cursor would feel magical (even more magical?).

It will also impact how we work: interactive IDEs like Cursor probably make more sense than CLI tools like Claude code when answers are nearly instant.

link

vidarh 331 days ago

I was justing thinking the opposite. If the answers are this instant, then subject to cost I'd be tempted to have the agent fork and go off and try a dozen different things, and run a review process to decide which approach(es) or part of approaches to present to the user.

It opens up a whole lot of use cases that'd be a nightmare if you have to look at each individual change.

link

mogili 331 days ago

Same.

However, I think Cerebras first needs to get the APIs to be more openAI compliant. I tried their existing models with a bunch of coding agents (include Cline which they did a PR for) and they all failed to work either due to a 400 error or tool calls not being formatted correctly. Very disappointed.

link

meowface 331 days ago

I just set up Groq with Kimi K2 the other day and was blown away by the speed.

Deciding if I should switch to Qwen 3 and Cerebras.

(Also, off-topic, but the name reminds me of cerebrates from Starcraft. The Zerg command hierarchy lore was fascinating when I was a young child.)

link

throwaw12 331 days ago

Have you used Claude Code and how do you compare the quality to Claude models? I am heavily invested in tools around Claude, still struggling to make a switch and start experimenting with other models

link

meowface 331 days ago

I still exclusively use Claude Code. I have not yet experimented with these other models for practical software development work.

A workflow I've been hearing about is: use Claude Code until quota exhaustion, then use Gemini CLI with Gemini 2.5 Pro free credits until quota exhaustion, then use something like a cheap-ish K2 or Qwen 3 provider, with OpenCode or the new Qwen Code, until your Claude Code credits reset and you begin the cycle anew.

link

bredren 331 days ago

Are you using Claude code or the web interface? I would like to try this with CC myself, apparently with some proxy use an OpenAI compatible LLM can be swapped in.

link

throwaw12 331 days ago

I am using Claude code, my experience with it so far is great. I use it primarily from terminal, this way I stay focused while reading code and CC doing its job in the background.

link

bredren 331 days ago

I’ve heard this repeated that using the env vars you can use gpt models, for example.

But then also that running a proxy tool locally is needed.

I haven’t tried this setup, and can’t say offhand if Cerebras’ hosted qwen described here is “OpenAI” compatible.

I also don’t know if all of the tools CC uses out of the box are supported in the most compatible non-Anthropic models.

Can anyone provide clarity / additional testimony on swapping out the engine on Claude Code?

link

derac 331 days ago

I've used Kimi K2, it works well. Personally I'm using Claude Code Router.

https://github.com/musistudio/claude-code-router

link

mehdibl 331 days ago

Issue most groq models are limited in context as that cost a lot of memory.

link

zozbot234 331 days ago

Obligatory reminder that 'Groq' and 'Grok' are entirely different and unrelated. No risk of a runaway Mecha-Hitler here!

link

throwawaymaths 331 days ago

instead risk of requiring racks of hardware to run just one model!

link

logicchains 331 days ago

It'll be nice if this generates more pressure on programming language compilation times. If agentic LLMs get fast enough that compilation time becomes the main blocker in the development process, there'll be significant economic incentives for improving compiler performance.

link