I would think there is some benefit to synthesizing, and compressing. Summarization is similar in that the heavier weighed text remains and the rest is pruned.
If the same basic information is all over a text, combine it.
I guess I’m proposing a new compression, new substitutions, the llm inventing new words to compress common ideas. A bytecode if you will. Compiling the context down.
Question is how good they are are retaining minute details from extremely long context, say 200k tokens.
That’s the frontier Claude and now GPT-4 Turbo are pushing