While the roots have been known for a long time, my impression is that the key paper that started this line of thought was Marco Cuturi's NIPS 2013 paper "Sinkhorn Distances", which is, IMHO, a very nice read.
Certainly I may be missing something, but it seems like the advance in this series of papers is that they figured out a way to calculate a differentiable solution to the sorting problem quickly, whereas it was already known that the a differentiable solution already existed, no?