Hacker News new | ask | show | jobs
by nijave 3 days ago
Well, it's about GPU VRAM if you want something competitive with cloud-hosted offerings at the performance levels showing in benchmarks. This is a heavy quant with quality degradation and significantly lower performance.

Cloud offerings are 80-200tk/sec versus single digit tk/sec.

That said, I'm also surprised it runs at all locally. I do think it'd be painfully slow for anything interactive so you're relying on another model for a comprehensive design or you're hoping a one-shot with somewhat degraded quality turns out correctly.

1 comments

I see. So not quite usable apart for specific use cases. Maybe in a few years we'll see new hardware players and better prices.
I think we'll see

- better hardware

- more efficient model runtime algorithms/code

- smarter/more efficient models (same capability with less parameters)

So ideally these will all come together and help.