|
|
|
|
|
by zozbot234
36 days ago
|
|
But that's why you shouldn't expect local models to provide quick real-time answers, at least not with the same smarts as SOTA models running in the cloud. Slow batched inference (if possible - RAM capacity can obviously be a challenge with typical models and end-user hardware) can be a lot more effective. |
|
The cost of not being efficient is even higher DRAM costs than we have now, given supply and demand.