| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wavemode 930 days ago
	Intriguing but understandable. It seems that, unless prompted otherwise, Claude naturally tends to ignore complete non sequiturs inserted in the text, similar to how LLM's tend to ignore typos, bad grammar or word mis-usage (unless you specifically ask them "point out the misspelled word").

2 comments

nathanfig 930 days ago

Scaling context is not something humans have good intuition for- I certainly don't recall an exact sentence from 200 pages ago. This is an area where we actually want the models to not mimic us.

link

pixl97 929 days ago

We'll need some kind of hybrid system to deal with this. For example the LLM 'indexes' the text it reads and assigns importance weights to parts of it, then as it moves to new text it can check back to these more important parts to ensure its not forgetting things.

link

basch 929 days ago

I would think there is some benefit to synthesizing, and compressing. Summarization is similar in that the heavier weighed text remains and the rest is pruned.

If the same basic information is all over a text, combine it.

link

jafitc 929 days ago

We already know LLMs are good at summarizing.

Question is how good they are are retaining minute details from extremely long context, say 200k tokens.

That’s the frontier Claude and now GPT-4 Turbo are pushing

link

basch 929 days ago

I guess I’m proposing a new compression, new substitutions, the llm inventing new words to compress common ideas. A bytecode if you will. Compiling the context down.

link

jafitc 929 days ago

Interestingly human memory works the other way.

We tend to remember out of place things more often.

E.g. if there was a kid in a pink hat and blue mustache at a suit and tie business party, everybody is going to remember the outlier.

link

GTP 929 days ago

But is it actually that useful to remember the exact words?

link

SheinhardtWigCo 929 days ago

RLHF is probably the reason for this.

link