Sure, but the whole point is to reduce this kind of search to RL, which is a very general framework. Their paper shows that such a generic approach can solve a very specific problem, and solve it well. But, their paper is about improving RL, not about improving sorting.