Hacker News new | ask | show | jobs
by magicalhippo 18 days ago
If you implement this on the GPU, it's my understanding you can get the 4th-order interpolation quite cheaply exploiting the bilinear texture sampling hardware[1].

So instead of reading 16 grid values and combining them to get the interpolated sample value, you can fetch 4 bilinearly filtered samples and combine those. And thanks to the hardware filtering, those bilinear samples cost basically the same as reading an unfiltered value.

[1]: https://developer.nvidia.com/gpugems/gpugems2/part-iii-high-...

1 comments

Yes, I do 4th order interpolation (M4') on the GPU. This paper is for 3rd order, though, but the methods may extend.

I suppose because the fetches are generally to similar memory regions, there may not be a substantial performance improvement due to L1 and L2 hits on recent GPUs.