Hacker News new | ask | show | jobs
by montroser 15 days ago
Result is ~12 tokens per second, as reported by OP down in these comments here.

An impressive effort, and better than I would have thought possible on this hardware -- but still pretty far short of what one needs for an satisfactory interactive session.

5 comments

Especially if you consider those smaller models are really cheap and fast on platforms like openrouter. Often by the factor 100-500 cheaper than SOTA models, and 2-5x in TPS.
Right. You can also perform RSA encryption on pencil and paper with a scientific calculator. It works, but it's not useful throughput for serious work
Yeah took way too long to find that result. Being able to run on slow RAM isn't surprising considering you can run a model off an SSD.
I was about to ask that
It's not terrible for interactive... https://mikeveerman.github.io/tokenspeed/?rate=12&mode=text

And it should be just fine for plenty of background use cases.