|
|
|
|
|
by jafitc
930 days ago
|
|
No, what it’s showing is that synthetic tests where Claude didn’t perform well can still work if prompted right. But at the end of the day the test was still synthetic! Placing out-of-context things in a 200k document, needle in a haystack style. Claude is still very very powerful for extracting data from 200k when it’s real world data and real questions (not adversarial synthetic test). |
|