| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by funnygiraffe 743 days ago
	I was under the impression that with LLMs, in order to get high-quality answers, it's always best to keep context short. Is that not the case anymore? Does Claude under this usage paradigm not struggle with very long contexts in ways as for example described in the "lost in the middle" paper (https://arxiv.org/abs/2307.03172)?

4 comments

azeirah 742 days ago

The conclusion you walked away with is the opposite of what usually works in practice.

The more context you give the llm, the better.

The key takeaway from that paper is to keep your instructions/questions/direction in the beginning or at the end of the context. Any information can go anywhere.

Not to be too dismissive, it's a good paper, but we're one year further and in practice this issue seems to have been tackled by training on better data.

This can differ a lot depending on what model you're using, but in the case of claude sonnet 3.5, more relevant context is generally better for anything except for speed.

It does remain true that you need to keep your most important instructions at the beginning or at the end however.

link

rvnx 742 days ago

At the beginning it was true, the longer the context, the more the LLM was lost, but now, the new models can retrieve information anywhere in the context

c.f.

https://pbs.twimg.com/media/GH2NJMxbYAAcRL3?format=jpg&name=...

link

throwup238 743 days ago

I don't have the time to evaluate the effects of context length on my use cases so I have no idea. There might be some degradation when I attach the Qt book which is probably already in Claude's training data but when using it against my private code base, it's not like I have any other choice.

The UX of drag and dropping a few monolithic markdown files to include entire chunks of a large project outweighs the downsides of including irrelevant context in my experience.

link

inciampati 742 days ago

No, you need to provide as much information in context as possible. Otherwise you are sampling from the mode. "Write me an essay about cows" = garbage boring and probably 200 words. "here are twenty papers about cow evolution, write me an overview of findings" = yes

link