Hacker News new | ask | show | jobs
by Const-me 3146 days ago
I wonder why people implement such things on CPU?

PCI express is ~100 gbit/sec, much faster than any network interface. Internally, a GPU can resize these images by an order of magnitude faster than that, see the fillrate columns in the GPU spec.

2 comments

This isn't just resampling an image: decoding a variety of image (and even video) formats, decompressing the selected frame, performing the actual resize, and then compressing the result. If the resample doesn't save more than the setup overhead, it'd be an immediate loss. Even if it does, there's an engineering cost since you now need to make sure that all of your servers have GPUs available, your chosen implementation code supports all of them with acceptable quality and error handling, etc.

Since the GPU hardware has become commonplace, there's definitely a lot more attention on using it in the server space and I think it'll become common in the next few years but that has a migration cost for early adopters since you're hitting less mature projects for critical functions. Internet-facing image processing has a bunch of tedious but important work handling format variations and errors (it'll be reported as a bug in your software if the image opens in a browser and/or photoshop), making sure that you handle gamma/colorspace consistently, etc.

If you're trying to get production-ready server out the door, it's really tempting not to deal with any of that once you hit the point where it's fast enough that engineering time costs more than the server savings.

> This isn't just resampling an image

GPUs can do that, too: http://fastcompression.com/products/jpeg/cuda-jpeg.htm

> you now need to make sure that all of your servers have GPUs available

OP is running on google’s cloud: “n1-standard-16 host type, peaking at 12 instances on a typical day.” That instance costs $0.76/hour. Adding NVIDIA Tesla K80 is $0.7 extra.

> it's really tempting not to deal with any of that

Yeah, that’s understandable. But the original article dealt with a lot of strange technologies to get the performance they want. And ended up doing much slower, performance wise, than what’s possible with a GPU.

> > This isn't just resampling an image

> GPUs can do that, too: http://fastcompression.com/products/jpeg/cuda-jpeg.htm

Agreed - but for how many different formats, and how well do those implementations support all of the various format options for things like bit depth or palettes, compression variants, etc.? That's not just things like compliance testing – itself a big problem – but also handling all of the slightly non-compliant data in the wild which users will inevitably expect to work.

(I'm somewhat biased having spent time dealing with JPEG 2000 imagery where various lapses on the standards side meant that it's still common to find images which don't display correctly in one or more implementations but are silently reported as correct in others)

Again, I'm not arguing that doing this on a GPU isn't a good idea — the hardware has become common enough that it's reasonable to assume availability for anyone who cares — but just that there's significant overhead cost for anyone who needs to handle images from unconstrained sources. It'll happen but this kind of thing always takes longer than it seems like it should.

> significant overhead cost for anyone who needs to handle images from unconstrained sources

Flickr is doing just that, and they’ve been using GPUs for more than 2 years already:

http://code.flickr.net/2015/06/25/real-time-resizing-of-flic...

> It'll happen but this kind of thing always takes longer than it seems like it should.

I think the main reason for that is lazy software developers reluctant to learn new stuff.

We did consider doing GPU, but it seems like you have fewer options there. We were really picky about the resize kernel used and it seems like with GPU you may not always get the same kernels available. Also presumably that only handles resizing, not compressing/decompressing, which make up a pretty sizeable portion of the workload.
> with GPU you may not always get the same kernels available

No kernels are available _out of the box_. You code a pixel shader, implement any kernel, or any other resizing method besides kernels: https://stackoverflow.com/a/42179924/126995

> that only handles resizing, not compressing/decompressing

In my previous comment there’s a link to a commercially available JPEG codec, 100% compliant with JPEG Baseline Standard, that does both compression and decompression.

Yikes. If we had had to write our own image resizing kernel, this would have taken much longer. And ok, it can do JPEG but what about PNG, GIF, and WEBP?
> If we had had to write our own image resizing kernel, this would have taken much longer

I don’t disagree but this is very subjective.

You don’t need to invent anything, you only need to carefully implement a well known approach, e.g. this one: https://developer.nvidia.com/gpugems/GPUGems/gpugems_ch24.ht...

Also there’re third party libraries for that, e.g. here’s one from the same company who do JPEG codec: http://fastcompression.com/products/resizer/gpu-resizer.htm

> what about PNG, GIF, and WEBP?

As far as I understand, you goal was to cut server costs, right?

I assume the majority of pictures on the Internet are jpegs. If you have them processing on the GPU, this leaves you 16 virtual CPUs you’ve already paid for just sitting idle and waiting for the GPU to finish the job. No need to do everything on GPU.

P.S. Some other people already implemented what I’m telling you: http://code.flickr.net/2015/06/25/real-time-resizing-of-flic...

Most probably its because of the time it takes to push the image on the the GPU and then back to the CPU.