Y
Hacker News
new
|
ask
|
show
|
jobs
by
naasking
95 days ago
I think the README [1] for the new CPU feature is of more interest, showing linear speedups with number of threads. Up to 73 tokens/sec with 8 threads (64 toks/s for their recommended Q6 quant):
https://github.com/microsoft/BitNet/blob/main/src/README.md