| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bitL 2002 days ago
	RL needs a supercomputer and its code is usually too fragile - making a trivial mistake anywhere (missing a constant multiplication, swapping the order of two consecutive lines of code etc.) would likely lead to your model never converging even if you got everything else right.

4 comments

chasely 2002 days ago

The hard part of RL for the problems I've encountered in my work is that you need a simulator. Building a reliable and accurate simulator is often an immense undertaking.

link

dgb23 2002 days ago

Maybe data scientists should team up (more?) with game programmers. They have a ton of experience in building very complex simulations.

link

Ma8ee 2002 days ago

Which code is not fragile in that sense? I think that is a rather strange criticism.

link

Iv 2002 days ago

You can do RL on an raspberry pi. Depends what problem you are trying to solve but not all of them require video analysis and billions of parameters.

link

cbames89 2002 days ago

Technical point: Value functions that are a constant multiples of each other result in the same behavior.

link

bitL 2001 days ago

Making a constant multiplication mistake somewhere in the code doesn't imply the new value function would be a constant multiply of the optimal one.

link