Y
Hacker News
new
|
ask
|
show
|
jobs
by
manmal
76 days ago
That’s what, 14GB/s? The GPU‘s VRAM can do 100x that.
1 comments
GeekyBear
76 days ago
A discrete consumer GPU card doesn't have enough fast RAM to run a very large model that hasn't been quanitized to hell.
That's why all the projects streaming models into the GPU from an SSD popped up recently.
link
manmal
76 days ago
Yes. There’s just no way to get above 1t/s that way with a large model.
link
That's why all the projects streaming models into the GPU from an SSD popped up recently.