| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by senko 929 days ago
	We've recently tested long context recall across Claude (2 and Instant) and GPT (3.5 and 4), results in https://dev.to/zvone187/gpt-4-vs-claude-2-context-recall-ana... Claude2 beats GPT4 in recall reliability, but is slower.

3 comments

zwaps 929 days ago

Excellent article. This suggests the Gpt scalings are like Rope scalings and one should not go beyond 2x original context length.

If Claude2 has an internal Rag, then this means also that the 200k context length only holds for queries that allow for an out of the box

Thanks for the insights!

link

dr_kiszonka 929 days ago

One recurring problem I have with Claude 2 is that it sometimes "bugs out" and starts to repeat the same token ad infinitum (which I still have to pay for). This happens with longer prompts, say, 30k. Have you encountered this issue?

link

senko 929 days ago

I haven't, but tbh we work a lot more with GPT than Claude so it's possible I haven't encountered many warts there.

For what we do (AI code writing), GPT output seems qualitatively much better than Claude's, but we want to keep our options open.

link

dr_kiszonka 929 days ago

Thanks!

I use it for classification for a personal project (non-commercial) and, for me, they are both pretty close in terms of quality. GPT-4 is better, but has a shorter window. I was hoping to reduce costs by using Claude exclusively, but that bug makes it too unreliable, sadly.

link

jafitc 929 days ago

My experience matched this as well.

GPT-4 Turbo is more watered down on the details with long context

But also it’s a newer feature for OpenAI, so they might catch up with next version

link