| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nijave 3 days ago

Well, it's about GPU VRAM if you want something competitive with cloud-hosted offerings at the performance levels showing in benchmarks. This is a heavy quant with quality degradation and significantly lower performance.

Cloud offerings are 80-200tk/sec versus single digit tk/sec.

That said, I'm also surprised it runs at all locally. I do think it'd be painfully slow for anything interactive so you're relying on another model for a comprehensive design or you're hoping a one-shot with somewhat degraded quality turns out correctly.

1 comments

edg5000 3 days ago

I see. So not quite usable apart for specific use cases. Maybe in a few years we'll see new hardware players and better prices.

link

nijave 3 days ago

I think we'll see

- better hardware

- more efficient model runtime algorithms/code

- smarter/more efficient models (same capability with less parameters)

So ideally these will all come together and help.

link