Hacker News new | ask | show | jobs
by senko 929 days ago
We've recently tested long context recall across Claude (2 and Instant) and GPT (3.5 and 4), results in https://dev.to/zvone187/gpt-4-vs-claude-2-context-recall-ana...

Claude2 beats GPT4 in recall reliability, but is slower.

3 comments

Excellent article. This suggests the Gpt scalings are like Rope scalings and one should not go beyond 2x original context length.

If Claude2 has an internal Rag, then this means also that the 200k context length only holds for queries that allow for an out of the box

Thanks for the insights!

One recurring problem I have with Claude 2 is that it sometimes "bugs out" and starts to repeat the same token ad infinitum (which I still have to pay for). This happens with longer prompts, say, 30k. Have you encountered this issue?
I haven't, but tbh we work a lot more with GPT than Claude so it's possible I haven't encountered many warts there.

For what we do (AI code writing), GPT output seems qualitatively much better than Claude's, but we want to keep our options open.

Thanks!

I use it for classification for a personal project (non-commercial) and, for me, they are both pretty close in terms of quality. GPT-4 is better, but has a shorter window. I was hoping to reduce costs by using Claude exclusively, but that bug makes it too unreliable, sadly.

My experience matched this as well.

GPT-4 Turbo is more watered down on the details with long context

But also it’s a newer feature for OpenAI, so they might catch up with next version