|
|
|
|
|
by Xorlev
36 days ago
|
|
> i.e., claude code and similar, things are either prefill-bound When accounting for prefix caching, this greatly accelerates each turn. Barring large file reads, prefill still isn't the bottleneck vs. decoding reasoning tokens. Script-writing too. This is especially true during exploration phases when traversing through directory trees and grepping files, you're talking about a few hundred tokens/turn. |
|