Hacker News new | ask | show | jobs
by unglaublich 22 days ago
30tok/s looks fine when you're just streaming code, but the issue is that there's a lot of background noise like tool-calling conventions, metadata, "thinking", etc.