| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by spacedoutman 66 days ago

This research is useless and nearly all other LLM research is too.

gpt 5.2 is the strongest model they tested, a nearly 6 month old model.

Traditional research can not keep up.

2 comments

acgourley 66 days ago

I disagree, their findings should generalize to the frontier. Even if the latest can deal with the extra complexity, it stands to reason it will take more tokens to do less. This could be a useful insight into the next generation of evals.

link

abujazar 65 days ago

Agreed. As Simon Willison points out, November 2025 was a a critical months because that's pretty much when coding agents became «good enough», eliminating most of the problems pointed out in this study.

link

anygivnthursday 65 days ago

I regularly see Claude Opus 4.7 dropping constraints from an otherwise small CLAUDE.md at merely 20% context use. I have to keep reminding it, and it has all info ready in its context, still time to time decides to ignore parts.

link

sanxiyn 65 days ago

GPT-5.2 was released after November 2025.

link