| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jazzypants 82 days ago
	I'm not GP, but I would want a benchmark that actually tests the entire context window. A benchmark that only tests the first 128K tokens effectively tells us nothing about how well it works at its full capacity.

1 comments

That makes sense! We are working on that.