|
|
|
|
|
by lifeinthevoid
264 days ago
|
|
I built a similar system, meanwhile I've sold one of the RTX 3090's. Local inference is fun and feels liberating, but it's also slow, and once I was used to the immense power of the giant hosted models, the fun quickly disappeared. I've kept a single GPU to still be able to play a bit with light local models, but not anymore for serious use. |
|
The issue is not that it's slow. 20-30 tk/s is perfectly acceptable to me.
The issue is that the quality of the models that I'm able to self-host pales in comparison to that of SOTA hosted models. They hallucinate more, don't follow prompts as well, and simply generate overall worse quality content. These are issues that plague all "AI" models, but they are particularly evident on open weights ones. Maybe this is less noticeable on behemoth 100B+ parameter models, but to run those I would need to invest much more into this hobby than I'm willing to do.
I still run inference locally for simple one-off tasks. But for anything more sophisticated, hosted models are unfortunately required.