Hacker News new | ask | show | jobs
by hbbio 1 day ago
Interesting setup, thx for sharing.

How many tokens/sec do you get with 27b? Are you using MTP?

1 comments

I haven't done any in-depth synthetic benchmarks but I had my Hermes agent run some and I ran a couple directly on the LLM Gateway that showed similar results.

Hermes reported 18.45 tok/s consuming the llama-swap endpoint across the wire. Locally I got 19-19.1 tok/s on the gateway. I'm running the Qwen 3.6 27B Q6 model (qwen3-6-27b-q6-k) off LM Studio and it's less than 0.3s to first token.

It's not good for conversational use cases as it can take 1-2 minutes to respond to a prompt.

I have two Hermes Profiles running, one is a personal assistant that manages my backlog and provides me morning reminders, solicits for evening updates, and will run overnight research projects for me. The other profile is a coding helper for personal projects. I can ask it to make changes and it will churn for 15 minutes, submit a PR, and notify me that the PR is ready to review. It's faster than me at basic coding tasks.