|
|
|
|
|
by sergiopreira
67 days ago
|
|
Most 'runs on Mac' ports are a wrapper around a cloud call or a quantized shell of the original model. Going after the CUDA-specific kernels with pure-PyTorch alternatives is the kind of work that ages well, because the next CUDA-locked research release is three weeks away. One question: how much of the gather-scatter sparse conv is reusable for other TRELLIS-like architectures, or is it bespoke to this one? |
|
The main thing that's TRELLIS-specific is the neighbor cache key format, but that's a few lines to adapt.
The SDPA attention swap is even more reusable - it's just padding variable-length sequences into batches and calling torch.nn.functional.scaled_dot_product_attention.