|
|
|
|
|
by janwas
847 days ago
|
|
For the 7B IT and a short factual query I see 5.3 tps on a 5 year old Skylake Gold 6154 CPU @ 3.00GHz, 16 threads.
Expect a slight increase as we improve scalability. FYI using the NUQ (4.5-bit) quantization improves throughput by about 1.4x. |
|