Hacker News new | ask | show | jobs
by msp26 779 days ago
Can someone test this with ruler please? https://github.com/hsiehjackson/RULER

In practice all of these long contexts show degraded performance (there's a table on the repo). For my NLP work I find that GPT-4-turbo is much worse after 32k-ish.

1 comments

Hi, Leo, chief scientist @ Gradient, here. We've been eagerly awaiting the release of RULER's code ourselves! As mentioned below, we wanted to release a model to the community asap, and have plans already for further fine-tuning & more sophisticated evals.

If you have other suggestions, I'd be happy to chat further.

Hi!

Unless I'm missing something, they did add the eval scripts to that repo 4 days ago.

Waiting until 4 days ago =)