|
|
|
|
|
by hcrisp
1204 days ago
|
|
Yes, I would like to see the environment ported to Python, wrapped in gym, and given a good shaped reward, i.e. like reward = prior_height_delta - (height - target_height) - fuel_cost. Run Stable Baselines PPO or DQN on that and it should converge to something close to an optimized MPC controller. |
|
You can run PPO or DQN right now on the Open AI Gym implementation using Stable-Baselines3: https://stable-baselines3.readthedocs.io/en/master/
In fact I previously ran it locally and PPO solved the problem within 10 minutes of training with max reward of about 200.