Hacker News new | ask | show | jobs
by jafitc 930 days ago
No, what it’s showing is that synthetic tests where Claude didn’t perform well can still work if prompted right.

But at the end of the day the test was still synthetic!

Placing out-of-context things in a 200k document, needle in a haystack style.

Claude is still very very powerful for extracting data from 200k when it’s real world data and real questions (not adversarial synthetic test).

1 comments

This needs to be shown. For example, asking for something that is clearly in the training data (like Paul Grahams cv) is certainly not a proper way to test context recall
Could we feed it Anna Karenina and ask it what is a difference between happy and unhappy families?
Isn’t that the first sentence?
That is the point. Long book, checking the long context to see if remembers about the first sentence. Or you mean as a test it is better to randomly place the "needle"?
It was trained on this book so again, this is not a good test

It will know the answer even without the book