The python overhead of launching big ML jobs is nontrivial, so I think speeding that up would be meaningful. (I mean the initial tracing and other setup, not things once the GPUs are actually doing the work).
That seems more like the tracing overhead than the python overhead. The original jit proposal would I believe not help at all with that since ML workloads basically do their own jit. The post being discussed however pushes for a general framework and good tracing support so might help ML workloads.