Y
Hacker News
new
|
ask
|
show
|
jobs
by
tgtweak
50 days ago
Depends entirely on quantization. Q6_K with max context length (262144) is ~40GB of VRAM.
Q8 with the same context wouldn't fit in 48GB of VRAM, it did with 128k of context.