Hacker News new | ask | show | jobs
by polishgladiator 1046 days ago
I've been doing some hacking with Llama2 on an AMD 7900 XTX this weekend, using llama.cpp and q5_k_s quantization.

Compared to MK600 on an RTX 4090 in their data, I am measuring higher throughput and lower perplexity (again, note that I am using a cheaper GPU!)...