Hacker News new | ask | show | jobs
by xenonite 4616 days ago
On the contrary. Consider that the memory is the bottleneck when performing the blur. I understand you would create five instances of the images (50%, 12.5% Left, R, Top, B). This would worsen the bottleneck even more.

Additionally, the advantage of computing pixel by pixel is that the shader can operate massively parallel.

2 comments

I think I wasn't clear. I mean, send 5 textured polygons to the 3D hardware, with various darkenings and offsets on one texture (which if nothing else can be done via lighting, but there's probably other easier ways), and let it do the blending en masse. Instead of using shaders and blinding it to what you're doing, it may be able to render the polygons much faster, on an optimized path. And it may not. But it's worth a try.
I would think it is going to depend on your cache size. Piecewise will be better if the image can't all fit in the cache, but if the image is small enough you can fit everything in the cache.

Or, do you mean that memory is the bottleneck as in, shipping the image to the GPU's memory space?

Yes, it really depends on your cache size.

And the problem with that is, you can't guess the cache size. You can help yourself with profiling, but this leads to a local optimization for only some GPUs.

If you wish to run your code optimized for any GPU, the pixel-by-pixel approach usually works best. Then, the GPU scheduler can run as many neighboring threads as possible in subprocessors. Note that every subprocessor has another local cache which is really quick.