| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by hcrisp 1204 days ago
	Yes, I would like to see the environment ported to Python, wrapped in gym, and given a good shaped reward, i.e. like reward = prior_height_delta - (height - target_height) - fuel_cost. Run Stable Baselines PPO or DQN on that and it should converge to something close to an optimized MPC controller.

1 comments

paradite 1204 days ago

It is already there, just not this particular implementation (or maybe it is?).

You can run PPO or DQN right now on the Open AI Gym implementation using Stable-Baselines3: https://stable-baselines3.readthedocs.io/en/master/

In fact I previously ran it locally and PPO solved the problem within 10 minutes of training with max reward of about 200.

link

hcrisp 1204 days ago

This is a different lunar lander than you are maybe thinking. It looks more like SpaceX's Starship than an Apollo lunar module. I don't think it has been made into a gym env yet but that would be great if it is!

link