| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by naturalgradient 2903 days ago
	This is a weirdly shallow article containing lots of diagrams and bullet points to just summarize the known points that RL needs a lot of data and needs to learn from scratch. No mention of all the ongoing work in learning from demonstrations, or more generally incorporating any off-policy knowledge. Vague speculations about the philosophy of model free learning. Not really worth the read (as someone working in RL).

1 comments

andreyk 2903 days ago

All that stuff is in part two! https://thegradient.pub/how-to-fix-rl/

Says as much at the end... to be fair we did warn up front "The first part, which you're reading right now, will set up what RL is and why it is fundamentally flawed. It will contain some explanation that can be skipped by AI practitioners." But personally I think the board game allegory is fun and that most people tend to forget the categorical simplicity of Go and Atari games and overhype ; easy to say the main points are not new but the details are important here.

link

jonnycomputer 2903 days ago

calling model-free RL "fundamentally flawed" is just click-baiting. too bad it worked on me; but I was hoping for insight.

link

ddoolin 2903 days ago

In your opinion, is this a solution to the "AI winter" that is often talked about? I'm an engineer but not involved in AI but things like meta-reinforcement seem, from the info/perspective you've given, to address the problem, at least partially.

link

andreyk 2903 days ago

I think AI winter is unlikely to come about this time since non-RL stuff (supervised learning) has been so successful and useful.

link

Iv 2903 days ago

Yes, some techs are overhyped (chatbots, finance stuff) but deeplearning has delivered a lot of incredible working applications. It is not just hot air or marketing hype.

link

backpropaganda 2903 days ago

Expert systems were not just hot air or marketing hype. Usefulness of a subset of new AI technology is irrelevant. A winter or contraction is caused by expectations not being met, and it seems, at least to me, that investors/funders have already started expecting superhuman performance in image/speech recognition, and there's a lot of expectation even in robotics, which will probably not be met by actual results any time soon.

link