| Very interesting. The way I understand this works is that the researchers found a clever architectural hack to stop AI from hoarding memory when reading long documents. Normally, when an AI transcribes a 100 page PDF, it tries to remember every single word it has already ingested. This short-term memory (the KV cache) grows linearly O(N) until the model runs out of VRAM and crashes (or caps it) To avoid this, developers are forced to build janky code that chops PDFs into individual pages, processes them one by one, and glues the text back together. Unlimited OCR uses Reference Sliding Window Attention (R-SWA) to split the AI's focus into two paths: Global Reference: The AI keeps full, uncompromised sight of the original document image so it never loses context. Local Generation: The AI restricts its memory of its own typed text to a tight, moving window (like the last 128 words) and safely forgets the rest. Will be very interesting for local AI and can’t wait to see what the community builds and extends with it! |
You have the overriding context, facts that don't change very often at all. The participants names, their backgrounds etc.
Then you have some very fine grained facts (what they ate for breakfast this morning) which might be useful right now, but are irrelevant outside of a general trend over the longer term.
When trying to reconstruct a conversation you really need to find the right balance without pulling in everything that has ever been discussed.
This definitely is worth further investigation.