| Was just about to post that Haiku 4.5 does something I have never encountered before [0], there is a massive delta between token/sec depending on the query. Some variance including task specific is of course nothing new, but never as pronounced and reproducible as here. A few examples, prompted at UTC 21:30-23:00 via T3 Chat [0]: Prompt 1 — 120.65 token/sec — https://t3.chat/share/tgqp1dr0la Prompt 2 — 118.58 token/sec — https://t3.chat/share/86d93w093a Prompt 3 — 203.20 token/sec — https://t3.chat/share/h39nct9fp5 Prompt 4 — 91.43 token/sec — https://t3.chat/share/mqu1edzffq Prompt 5 — 167.66 token/sec — https://t3.chat/share/gingktrf2m Prompt 6 — 161.51 token/sec — https://t3.chat/share/qg6uxkdgy0 Prompt 7 — 168.11 token/sec — https://t3.chat/share/qiutu67ebc Prompt 8 — 203.68 token/sec — https://t3.chat/share/zziplhpw0d Prompt 9 — 102.86 token/sec — https://t3.chat/share/s3hldh5nxs Prompt 10 — 174.66 token/sec — https://t3.chat/share/dyyfyc458m Prompt 11 — 199.07 token/sec — https://t3.chat/share/7t29sx87cd Prompt 12 — 82.13 token/sec — https://t3.chat/share/5ati3nvvdx Prompt 13 — 94.96 token/sec — https://t3.chat/share/q3ig7k117z Prompt 14 — 190.02 token/sec — https://t3.chat/share/hp5kjeujy7 Prompt 15 — 190.16 token/sec — https://t3.chat/share/77vs6yxcfa Prompt 16 — 92.45 token/sec — https://t3.chat/share/i0qrsvp29i Prompt 17 — 190.26 token/sec — https://t3.chat/share/berx0aq3qo Prompt 18 — 187.31 token/sec — https://t3.chat/share/0wyuk0zzfc Prompt 19 — 204.31 token/sec — https://t3.chat/share/6vuawveaqu Prompt 20 — 135.55 token/sec — https://t3.chat/share/b0a11i4gfq Prompt 21 — 208.97 token/sec — https://t3.chat/share/al54aha9zk Prompt 22 — 188.07 token/sec — https://t3.chat/share/wu3k8q67qc Prompt 23 — 198.17 token/sec — https://t3.chat/share/0bt1qrynve Prompt 24 — 196.25 token/sec — https://t3.chat/share/nhnmp0hlc5 Prompt 25 — 185.09 token/sec — https://t3.chat/share/ifh6j4d8t5 I ran each prompt three times and got (within expected variance meaning less than 5% plus or minus) the same token/sec results for the respective prompt. Each used Claude Haiku 4.5 with "High reasoning". Will continue testing, but this is beyond odd. I will add that my very early evals leaned heavily into pure code output, where 200 token/sec is consistently possible at the moment, but it is certainly not the average as claimed before, there I was mistaken. That being said, even across a wider range of challenges, we are above 160 token/sec and if you solely focus on coding, whether Rust or React, Haiku 4.5 is very swift. [0] Normally not using T3 Chat for evals, just easier to share prompts this way, though was disappointed to find that the model information (token/sec, TTF, etc.) can't be enabled without an account. Also, these aren't the prompts I usually use for evals. Those I try to keep somewhat out of training by only using paid for API for benchmarks. As anything on Hacker News is most assuredly part of model training, I decided to write some quick and dirty prompts to highlight what I have been seeing. |
Anthropic mentioned this model is more then twice as fast as claude sonnet 4 [2], which OpenRouter averaged at 61.72 tps for sonnet 4 [3]. If these numbers hold we're really looking at an almost 3x improvement in throughput and less then half the initial latency.
[1] https://openrouter.ai/anthropic/claude-haiku-4.5 [2] https://www.anthropic.com/news/claude-haiku-4-5 [3] https://openrouter.ai/anthropic/claude-sonnet-4