| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jafitc 930 days ago

No, what it’s showing is that synthetic tests where Claude didn’t perform well can still work if prompted right.

But at the end of the day the test was still synthetic!

Placing out-of-context things in a 200k document, needle in a haystack style.

Claude is still very very powerful for extracting data from 200k when it’s real world data and real questions (not adversarial synthetic test).

1 comments

zwaps 929 days ago

This needs to be shown. For example, asking for something that is clearly in the training data (like Paul Grahams cv) is certainly not a proper way to test context recall

link

jafitc 929 days ago

Link from thread https://dev.to/zvone187/gpt-4-vs-claude-2-context-recall-ana...

link

mejutoco 929 days ago

Could we feed it Anna Karenina and ask it what is a difference between happy and unhappy families?

link

jafitc 929 days ago

Isn’t that the first sentence?

link

mejutoco 929 days ago

That is the point. Long book, checking the long context to see if remembers about the first sentence. Or you mean as a test it is better to randomly place the "needle"?

link

zwaps 929 days ago

It was trained on this book so again, this is not a good test

It will know the answer even without the book

link