| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by saliagato 1056 days ago
	You are correct. The paper is called "Lost in the middle" [1] and it is probably one of the worst drawbacks of this technology. It makes a lot of use cases biased (think of law). [0] https://arxiv.org/pdf/2307.03172.pdf

2 comments

trsohmers 1055 days ago

The research is slightly misleading... the models they experimented all had an original pretrained context length significantly less than the fine tuned context length they tested for, e.g. they used MPT-30B-Instruct, which was pretrained for 2k sequence length and then fine tuned for 8k sequence length. A real test of if current self attention has this issue would be natively training a model with the extended sequence length.

link

behnamoh 1056 days ago

> It makes a lot of use cases biased (think of law).

Yes, it's unfortunate. I wonder if GPT-4 with 32k ctx window is in a sense "smarter than GPT-4 with 8k ctx.

link