Hacker News new | ask | show | jobs
by smusamashah 651 days ago
It should be benchmarked against something like RULER[1]

1: https://github.com/hsiehjackson/RULER (RULER: What’s the Real Context Size of Your Long-Context Language Models)

1 comments

> To incorporate this, we ask the model to complete a chain of hashes instead (as recently proposed by RULER):

They did mention it but didn't provide concrete benchmarks