Hacker News new | ask | show | jobs
by janwas 847 days ago
For the 7B IT and a short factual query I see 5.3 tps on a 5 year old Skylake Gold 6154 CPU @ 3.00GHz, 16 threads. Expect a slight increase as we improve scalability.

FYI using the NUQ (4.5-bit) quantization improves throughput by about 1.4x.