|
|
|
|
|
by msp26
779 days ago
|
|
Can someone test this with ruler please? https://github.com/hsiehjackson/RULER In practice all of these long contexts show degraded performance (there's a table on the repo). For my NLP work I find that GPT-4-turbo is much worse after 32k-ish. |
|
If you have other suggestions, I'd be happy to chat further.