Hacker News new | ask | show | jobs
by stingraycharles 73 days ago
And I wonder how redacting them reduces latency, as it sure as hell doesn’t make the responses any faster and bandwidth isn’t the issue here.
1 comments

They provide thinking summaries, so I assume they have to call Haiku or some other model to summarise the thinking blocks.
That’s not asynchronous? Wouldn’t it make more sense to disable those thinking summaries in those cases rather than hiding the thinking altogether?