Y
Hacker News
new
|
ask
|
show
|
jobs
by
pennaMan
879 days ago
I can run 4bit on a beat up 1070 ti. GP talks about higher precision models
1 comments
sp332
879 days ago
You wouldn’t be able to fit the whole model into 8GB VRAM. It’s faster than not using a GPU at all, but most of it would still be computed on the CPU.
link
baq
879 days ago
IME ollama ran mixtral on a 1070 fast enough.
link
dimask
878 days ago
Though it most probably does not run in on the 1070 but rather on the cpu. It cannot fit on a 1070, it is not about speed, a 1070 cannot run it period.
link
Dkuku
878 days ago
In llama.cpp You can offload some of the layers to gpu with -ngl X. Where x is the number of layers
link