Y
Hacker News
new
|
ask
|
show
|
jobs
by
GeekyBear
76 days ago
A discrete consumer GPU card doesn't have enough fast RAM to run a very large model that hasn't been quanitized to hell.
That's why all the projects streaming models into the GPU from an SSD popped up recently.
1 comments
manmal
76 days ago
Yes. There’s just no way to get above 1t/s that way with a large model.
link