Hacker News new | ask | show | jobs
by latrine5526 102 days ago
I have a 5090d and got ~140 token/s output when running qwen-3.5-9b-heretic in lmstudio.

I disabled the thinking and configured the translate plugin on my browser to use the lmstudio API.

It performs way better than Google Translate in accuracy. The speed is a little slower, but sufficient for me.