|
|
|
|
|
by hnuser123456
286 days ago
|
|
Lots of people already have RTX 3090/4090/5090 for gaming and they can run 30b-class models at 40+ tok/sec. There is a huge field of models and finetunes of this size on huggingface. They are a little bit dumber than the big cloud models but not by much. And being able to run them 24/7 for just the price of electricity (and the privacy) is a big pull. |
|
No, they can run quantized versions of those models, which are dumber than the base 30b models, which are much dumber than > 400b models (from my use).
> They are a little bit dumber than the big cloud models but not by much.
If this were true, we wouldn't see people paying the premiums for the bigger models (like Claude).
For every use case I've thrown at them, it's not a question of "a little dumber", it's the binary fact that the smaller models are incapable of doing what I need with any sort of consistency, and hallucinate at extreme rates.
What's the actual use case for these local models?