|
|
|
|
|
by magicalhippo
18 days ago
|
|
If you implement this on the GPU, it's my understanding you can get the 4th-order interpolation quite cheaply exploiting the bilinear texture sampling hardware[1]. So instead of reading 16 grid values and combining them to get the interpolated sample value, you can fetch 4 bilinearly filtered samples and combine those. And thanks to the hardware filtering, those bilinear samples cost basically the same as reading an unfiltered value. [1]: https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-... |
|
I suppose because the fetches are generally to similar memory regions, there may not be a substantial performance improvement due to L1 and L2 hits on recent GPUs.