| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sinwave 4211 days ago
	I suspect that the best way to tackle this problem is to treat "learning to walk" as a reinforcement learning problem. This way you can evaluate actions within each episode rather than waiting until the end of each trial to adjust parameters encoding the walking strategy. As karpathy suggests below (and he's certainly much more qualified than me), an evolutionary method such as GA's, while apparently fairly effective, could well be wasting valuable information learned in real time through interaction.