- persistent CUDA kernel
- tiled processing with overlapping read/writes
- model designed with specific constraints in mind