Y
Hacker News
new
|
ask
|
show
|
jobs
by
jazzypants
35 days ago
I'm not GP, but I would want a benchmark that actually tests the entire context window. A benchmark that only tests the first 128K tokens effectively tells us nothing about how well it works at its full capacity.
1 comments
alexsubq
33 days ago
That makes sense! We are working on that.
link