| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dontreact 1653 days ago
	The name of the game here is generality. For a really general agent, they are looking to have superhuman performance, not get state of the art on every individual task. Beating stockfish 8 convinces me that it would be superhuman at chess.

1 comments

remram 1653 days ago

They could still be honest that it's Stockfish 8, not the Stockfish everyone has. Your product having genuine value does not excuse lying about that value.

link

Skyy93 1653 days ago

I observed this kind of behavior in many papers nowadays. This extremely painful for research, because some better candidates could be overseen and FAANG publishs a majority in the ML-paper section. Its a mess.

link

ShamelessC 1653 days ago

They were? They say they use Stockfish 8 the very first time they mention it.

link

remram 1653 days ago

First time they mention it is page 10:

> one of the strongest and most widely-used programs is Stockfish [81].

Here's the citation, note the date:

> [81] The Stockfish Development Team. Stockfish: Open source chess engine, 2021. https://stockfishchess.org/.

They mention the version number only once, further down, and don't point out that it's out of date since February 2018. All other 11 mentions of it don't have the version number, like in that sentence:

> In Chess, PoG(60000,10) is stronger than Stockfish using 4 threads and one second of search time.

link

hesperiidae 1653 days ago

>First time they mention it is page 10:

Yeah, so it is! I guess I ran into the same weirdness as ShamelessC, since when I first Ctrl-F:ed the PDF, hit 1/11 was on page 11. Now that I try my damndest to reproduce it, I get 12 hits and the first is that one on page 10.

link

hesperiidae 1653 days ago

Yup, "In chess, we evaluated PoG against Stockfish 8, level 20 [81] and AlphaZero."

link