Does anyone have high level guidance on when (deep) RL is worth pursuing for optimization (e.g. optimizing algorithm design) rather than other approaches (e.g genetic)?
Less of a scale problem than a type problem usually in my experience.
My rule of thumb is when it’s easy to specify a reward function but infinite ways to traverse the action space - versus having a constrained state and action space (small n solution traversal pathways) and only a few possible paths to traverse.
Start with a planet-scale computer that makes the marginal cost of RL be nearly zero, and at the same time spend a lot of money on hashing and sorting so the micro-optimization pays off.
My rule of thumb is when it’s easy to specify a reward function but infinite ways to traverse the action space - versus having a constrained state and action space (small n solution traversal pathways) and only a few possible paths to traverse.