| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by genpfault 224 days ago
	Getting ~150 tok/s on an empty context with a 24 GB 7900XTX via llama.cpp's Vukan backend.

1 comments

Again, you're using some 3rd party quantisations, not the weights supplied by Nvidia (which don't fit in 24GB).