|
|
|
|
|
by lostmsu
6 days ago
|
|
Assuming you magically use all 128GiB of xRAM you need to read ~32GiB per token in batched mode. On a good SSD that would be 1/3 tokens per second. Cool, 2x that you can do 2/3 tokens per second. Let's assume you are lucky and can actually do 6/7 tokens per second. That's still an extremely far cry from 20+ tokens per second of 27B before any batching. |
|