|
|
|
|
|
by MehdiBelkacem
45 days ago
|
|
Token anxiety is real. What worked for me:
prompt caching on fixed system prompts cut my Anthropic
bill by ~60% overnight. Most devs don't realize cache
writes are 25x cheaper than input tokens on Claude. Local models for classification/routing + frontier only
for generation is the other move — but the latency
tradeoff is real if you're in a user-facing flow. |
|