Y
Hacker News
new
|
ask
|
show
|
jobs
by
genpfault
175 days ago
Getting ~150 tok/s on an empty context with a 24 GB 7900XTX via llama.cpp's Vukan backend.
1 comments
Tepix
173 days ago
Again, you're using some 3rd party quantisations, not the weights supplied by Nvidia (which don't fit in 24GB).
link