|
|
|
|
|
by muskmusk
1033 days ago
|
|
That is impressive. Interesting that the prefill (i am guessing this is prompt processing) is so much slower than decoding. Its my understanding that under normal circumstances decoding is memory bandwidth bound which prompt processing isn't due to batching. Is there some quirk in your setup? |
|
It's slow but usable!
prefill: 2.1963 tokens/sec, decoding: 3.4708 tokens/sec