| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by msp26 779 days ago
	Can someone test this with ruler please? https://github.com/hsiehjackson/RULER In practice all of these long contexts show degraded performance (there's a table on the repo). For my NLP work I find that GPT-4-turbo is much worse after 32k-ish.

1 comments

leonid_pekelis 779 days ago

Hi, Leo, chief scientist @ Gradient, here. We've been eagerly awaiting the release of RULER's code ourselves! As mentioned below, we wanted to release a model to the community asap, and have plans already for further fine-tuning & more sophisticated evals.

If you have other suggestions, I'd be happy to chat further.

link

msp26 779 days ago

Hi!

Unless I'm missing something, they did add the eval scripts to that repo 4 days ago.

link

leonid_pekelis 778 days ago

Waiting until 4 days ago =)

link