|
|
|
|
|
by nekusar
75 days ago
|
|
https://github.com/brontoguana/krasis On my desktop RTX 5060 TI (16GB) and 96GB ram, I routinely get 25-30 tokens/sec using an 80B model quantized to int8. Uses 65GB system ram and 15GB gfx ram. And its plenty fast for many of my purposes. I could easily run a 30B model bf16 (full) and do like 50tok/s |
|