Y
Hacker News
new
|
ask
|
show
|
jobs
by
antirez
39 days ago
Prefill is 400 t/s in that hardware. Just if the prompt is very short you can't see the real speed and it will default to single token context processing.
1 comments
simonw
39 days ago
Hah, that's my fault for just using "Generate an SVG of a pelican riding a bicycle" as my test prompt!
link