|
|
|
|
|
by throwdbaaway
99 days ago
|
|
Using ik_llama.cpp to run a 27B 4bpw quant on a RTX 3090, I get 1312 tok/s PP and 40.7 tok/s TG at zero context, dropping to 1009 tok/s PP and 36.2 tok/s TG at 40960 context. 35B A3B is faster but didn't do too well in my limited testing. |
|