Hacker News new | ask | show | jobs
by sergiopreira 67 days ago
Most 'runs on Mac' ports are a wrapper around a cloud call or a quantized shell of the original model. Going after the CUDA-specific kernels with pure-PyTorch alternatives is the kind of work that ages well, because the next CUDA-locked research release is three weeks away. One question: how much of the gather-scatter sparse conv is reusable for other TRELLIS-like architectures, or is it bespoke to this one?
1 comments

The gather-scatter sparse conv should be fairly generic. Any model using 3x3x3 or 5x5x5 sparse convolutions on voxel grids could use it directly.

The main thing that's TRELLIS-specific is the neighbor cache key format, but that's a few lines to adapt.

The SDPA attention swap is even more reusable - it's just padding variable-length sequences into batches and calling torch.nn.functional.scaled_dot_product_attention.