Hacker News new | ask | show | jobs
by jazzypants 35 days ago
I'm not GP, but I would want a benchmark that actually tests the entire context window. A benchmark that only tests the first 128K tokens effectively tells us nothing about how well it works at its full capacity.
1 comments

That makes sense! We are working on that.