Hacker News new | ask | show | jobs
by Tier3r 592 days ago
That seems like a good idea. I am puzzled by what benefit the RL has in OP. It seems like a well defined constraint optimisation problem that could be done without RL, for example in the way you mentioned.