Hacker News new | ask | show | jobs
by zozbot234 4 days ago
If input tokens dominate the cost to that extent, this implies that major gains are possible by making better use of caching. You could basically ask the model to do a one-time "compaction" step including a dump of the relevant portions of the code, and use that as the cached prefix for a large amount of "swarm" subagent calls.