| Interesting, I’ve never needed 1M, or even 250k+ context. I’m usually under 100k per request. About 80% of my code is AI-generated, with a controlled workflow using dev-chat.md and spec.md. I use Flash for code maps and auto-context, and GPT-4.5 or Opus for coding, all via API with a custom tool. Gemini Pro and Flash have had 1M context for a long time, but even though I use Flash 3 a lot, and it’s awesome, I’ve never needed more than 200k. For production coding, I use - a code map strategy on a big repo. Per file: summary, when_to_use, public_types, public_functions. This is done per file and saved until the file changes. With a concurrency of 32, I can usually code-map a huge repo in minutes. (Typically Flash, cheap, fast, and with very good results) - Then, auto context, but based on code lensing. Meaning auto context takes some globs that narrow the visibility of what the AI can see, and it uses the code map intersection to ask the AI for the proper files to put in context. (Typically Flash, cheap, relatively fast, and very good) - Then, use a bigger model, GPT 5.4 or Opus 4.6, to do the work. At this point, context is typically between 30k and 80k max. What I’ve found is that this process is surprisingly effective at getting a high-quality response in one shot. It keeps everything focused on what’s needed for the job. Higher precision on the input typically leads to higher precision on the output. That’s still true with AI. For context, 75% of my code is Rust, and the other 25% is TS/CSS for web UI. Anyway, it’s always interesting to learn about different approaches. I’d love to understand the use case where 1M context is really useful. |
You have to make sure the semantic summary takes up significantly less tokens than just reading the code or its just a waste of token/time.
Then have a skill that uses git version logs to perform lazy summary cache when needed.