| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by robotswantdata 3 hours ago

Very interesting.

The way I understand this works is that the researchers found a clever architectural hack to stop AI from hoarding memory when reading long documents.

Normally, when an AI transcribes a 100 page PDF, it tries to remember every single word it has already ingested. This short-term memory (the KV cache) grows linearly O(N) until the model runs out of VRAM and crashes (or caps it) To avoid this, developers are forced to build janky code that chops PDFs into individual pages, processes them one by one, and glues the text back together.

Unlimited OCR uses Reference Sliding Window Attention (R-SWA) to split the AI's focus into two paths:

Global Reference: The AI keeps full, uncompromised sight of the original document image so it never loses context.

Local Generation: The AI restricts its memory of its own typed text to a tight, moving window (like the last 128 words) and safely forgets the rest.

Will be very interesting for local AI and can’t wait to see what the community builds and extends with it!

2 comments

_puk 1 hour ago

This hits a sweet spot I think for conversations too. I've been playing (for quite a while) on trying to encapsulate long running conversations.

You have the overriding context, facts that don't change very often at all. The participants names, their backgrounds etc.

Then you have some very fine grained facts (what they ate for breakfast this morning) which might be useful right now, but are irrelevant outside of a general trend over the longer term.

When trying to reconstruct a conversation you really need to find the right balance without pulling in everything that has ever been discussed.

This definitely is worth further investigation.

link

timwis 8 minutes ago

Can you say more about how this applies to long-running conversations? I've been thinking about them as well, but can't write wrap my head around how this would be better than (or even different to) standard compaction.

link

ewild 1 hour ago

This sounds like we are trying to add an LSTM into a transformer

link

htrp 48 minutes ago

Sepp would like a word

link

d675 2 hours ago

See, leetcode is useful. As I do this leetcode grind, I’ve been why techniques exist / how they’re used irl. Lots of interesting stuff there

link

ai_fry_ur_brain 2 hours ago

Who said it wasnt useful, dont listen to those people.

link

Xevion 1 hour ago

People who are applying to jobs and are tested with LeetCode problems to assess their skill level, despite the two not really being correlated or relevant for the position

link

galbar 1 hour ago

As someone that gets very annoyed when having to do LeetCode in interviews...

Knowing algorithms, data structures and their memory and time complexities is very relevant for SWE. I've had teammates that didn't understand them and everything was fine until when it wasn't (scaling and performance issues).

Or, as I put it to a teammate: "Would you rather review the PR of someone that understands the difference between a set and a list or the PR of someone who doesn't?". This was after we interviewed a candidate with ~15 YoE, on paper, that didn't know the difference.

link

elliottcarlson 1 hour ago

> Knowing algorithms, data structures and their memory and time complexities is very relevant for SWE

Agree with this; however knowing how to roll your own BFS/LRU/etc isn't -- in that case I'd rather review the PR of someone who understands how to leverage tested and known implementations than the PR of someone who decided to roll their own.

link

ai_fry_ur_brain 24 minutes ago

Who care's if the leetcode question doesn't relate to the job itself, it shows whether or not the person is willing to put in the work and gives you a glimpse into their ability to reason about hard problems.

link