Hacker News new | ask | show | jobs
by antirez 39 days ago
Prefill is 400 t/s in that hardware. Just if the prompt is very short you can't see the real speed and it will default to single token context processing.
1 comments

Hah, that's my fault for just using "Generate an SVG of a pelican riding a bicycle" as my test prompt!