Hacker News new | ask | show | jobs
by Xorlev 36 days ago
> i.e., claude code and similar, things are either prefill-bound

When accounting for prefix caching, this greatly accelerates each turn. Barring large file reads, prefill still isn't the bottleneck vs. decoding reasoning tokens. Script-writing too.

This is especially true during exploration phases when traversing through directory trees and grepping files, you're talking about a few hundred tokens/turn.