|
|
|
|
|
by terhechte
434 days ago
|
|
I don't think the time grows linearly. The more context the slower (at least in my experience because the system has to throttle). I just tried 2k tokens in the same model that I used for the 120k test some weeks ago and processing took 12 sec to first token (qwen 2.5 32b q8). |
|
IIUC the data we have:
2K tokens / 12 seconds = 166 tokens/s prefill
120K tokens / (10 minutes == 600 seconds) = 200 token/s prefill