Hmm, well if you mean torch.compile, y'all should still check out stable-fast, which is claiming ~16ms/iter on a 4090, twice that of torch.compile:
https://github.com/chengzeyi/stable-fast#rtx-4090-512x512-ba...