| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vessenes 57 days ago
	Yeah I find this current LLM voice very tiring to read; I get enough of it day-to-day wrangling claude and others. I don’t think ‘writing’ this took very much work though, it was probably a “read the research logs, and write a blog post with charts showing our amazing results and hammering on the idea that verifiers matter” as a prompt. The rest you could go have a coffee for. That said, the core idea of this — verification matters a lot — is well received, and in fact, this is totally awesome in terms of results. They mention at the end they’re not sure how much of this is microtuned against the benchmark, a sin that many CPU companies cheerfully commit and have committed over the last 40 years btw, so I’d be interested in a followup with more general benchmarking. Either way, amazing.

1 comments

fesens 57 days ago

Yeah, you are totally right. Its a work in progress, and the post was written by an LLM - Im trying to improve on it (dash pun intended).

Regarding the benchmark overfitting, absolutely, it's pretty much overfitted. This CPU will only be as good as it benchmark. If I have the time I will try to get some applications and optimize for those.

link