Hacker News new | ask | show | jobs
by cyanydeez 1 hour ago
yeah, then theres prompt loading too.

but anyone who can fit QWEN-3.6 35B with a sustained ~30 token/s and ~100k context with cache could print money as a hardware vendor.

1 comments

That just sounds like a 3090.