| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tunesmith 934 days ago
	Yeah I just saw the video from that researcher (later an OpenAI researcher?) that talked about it back in 2016... not that I understood much, but it definitely seemed that Q* was a generalization of the Q algorithm described on the previous slide. The optimum something across all somethings.

2 comments

LeCun: Please ignore the deluge of complete nonsense about Q*. https://twitter.com/ylecun/status/1728126868342145481

As someone with a borderline acceptable understanding of RL this is the most accurate take so far.

If you have the possibility I would be quite interested in a link to the video or alternatively the name of the researcher you mention.

It's Noam Brown, he worked at Meta AI before on Cicero and No-hands Poker before that.