Y
Hacker News
new
|
ask
|
show
|
jobs
by
xienze
40 days ago
I don't want to be a jerk but 31t/s prefill is basically unusable in an agentic situation. A mere 10k in context and you're sitting there for 5+ minutes before the first token is generated.
3 comments
fgfarben
40 days ago
That prefill number isn't right. M4 Max hits 200-300:
https://github.com/antirez/ds4/blob/main/speed-bench/m4_max_...
link
hadlock
40 days ago
M5 studio is gonna sell like hot cakes
link
throwdbaaway
40 days ago
Hah, that's because the prompt itself was only about 30 tokens. We need a much bigger prompt to properly test PP.
link
aiscoming
40 days ago
if it's just the coding agent system prompt and tools, you can cache that
link
xienze
40 days ago
Yeah the problem is that's just the start of the context. There's, you know, all the tool call results and file reads and stuff.
link