| - The gain in stable diffusion is modest (15%-25% last I checked?) - Torch 2.0 only supports static inputs. In actual usage scenarios, this means frequent lengthy recompiles. - Eventually, these recompiles will overload the compilation cache and torch.compile will stop functioning. - Some common augmentations (like TomeSD) break compilation, force recompiles, make compilation take forever, or kill the performance gains. - There are othdr miscellaneous bugs, like compilation freezing the Python thread and causing networking timeouts in web UIs, or errors with embeddings. - Dynamic input in Torch 2.1 nightly fixes many of these issues, but was only maybe working a week ago? See https://github.com/pytorch/pytorch/issues/101228#issuecommen... - TVM and AITemplate have massive performance gains. ~2x or more for AIT, not sure about an exact number for TVM. - AIT supported dynamic input before torch.compile did, and requires no recompilation after the initial compile. Also, weights (models and LORAs) can be swapped out without a recompile. - TVM supports very performant Vulkan inference, which would massively expand hardware compatibility. Note that the popular SD Web UIs don't support any of this, with two exceptions I know of: VoltaML (with WIP AIT support) and the Windows DirectML fork of A1111 (which uses optimized ONNX models, I think). There is about 0% chance of ML compilation support in A1111, and the HF diffusers UIs are less bleeding edge and performance/compatibility focused. And yes, triton torch.compile is aimed at training. There is an alternative backend (Hidet) that explicitly targets inference, but it does not work with Stable Diffusion yet. |