I've always been amazed at how terrible most frontier LLMs are at compaction given how embarrassingly easy it is to come up with half a dozen different RL training evals which would teach models to generate useful context summaries. Heck, you could bolt it onto any existing RL eval by just forcing a compaction every three turns.