|
|
|
|
|
by polishgladiator
1046 days ago
|
|
I've been doing some hacking with Llama2 on an AMD 7900 XTX this weekend, using llama.cpp and q5_k_s quantization. Compared to MK600 on an RTX 4090 in their data, I am measuring higher throughput and lower perplexity (again, note that I am using a cheaper GPU!)... |
|