Y
Hacker News
new
|
ask
|
show
|
jobs
by
oreoftw
389 days ago
most likely he was referring the fact that you need plenty of GPU-fast memory to keep the model, and GPU cards have it.
1 comments
adastra22
389 days ago
There is nothing magical about GPU memory though. It’s just faster. But people have been doing CPU inference since the first llama code came out.
link