Hacker News new | ask | show | jobs
by pizza 1418 days ago
Ooh now that is very interesting. I would really love to see how this speeds up the run-time of fpng as a whole, if you have any numbers. It looks like fjxl [0] and fpnge [1] (which also uses AVX2) are at the Pareto front for lossless image compression right now [2], but if this speeds things significantly then it's possible there'll be a huge shakeup!

[0] https://github.com/libjxl/libjxl/tree/main/experimental/fast...

[1] https://github.com/veluca93/fpnge

[2] https://twitter.com/richgel999/status/1485976101692358656

2 comments

Unfortunately I haven’t had the time to do a proper benchmark, and the fpng test executable only decodes/encodes a single image which produces very noisy/inconclusive results. However, I’m under the impression that it doesn’t make a large difference in terms of overall time.

fpnge (which I wasn’t aware of until now) appears to already be using a very similar (identical?) algorithm, so I suspect the relative performance of fpng and fpnge would not be significantly impacted by this change.

As someone who has been recently optimising fpnge, Adler32 computation is pretty much negligible regarding overall runtime. The Huffman coding and filter search take up most of the time. (IIRC fpng doesn't do any filter search, but Huffman encoding isn't vectorized, so I'd expect that to dominate fpng's runtime)
If image encode/decode speed is the only concern, libjpegturbo is going to be orders of magnitude faster than any of these lossless schemes. With jpeg, you could encode 1080p bitmaps in <10 milliseconds (per thread) on any consumer PC made in the last decade.

The frequency domain is a really powerful place to operate in when you are dealing with this amount of data.

That's not true. libjpeg-turbo is ~50 MB/s last I tried - plus it's not lossless. fjxl and fpnge are basically an order of magnitude faster than that. libjpeg-turbo isn't even the fastest jpeg codec - you should check out the (relatively obscure) libmango - roughly 1 gbps decode on a 2020 macbook pro - or nvJPEG for GPU-based JPEG decoding. Supposedly there's even faster GPU-based decoders than nvJPEG, too.
> GPU-based

How does this impact the overall latency of encoding a single image?

I've written an open-source driver for the decoding side of the nvjpg module found in the Tegra X1 (ie. earlier hardware revision than the one in the A100).

I did some quick benchmarks against libjpeg-turbo, if that can give you an idea. I expect encoding performance would be similar.

https://github.com/averne/oss-nvjpg#performance

Probably quite a bit, I don't know. The typical use case is to load up thousands of JPEGs at once to get good throughput despite copy overhead. You can see here the benchmark against jpeg-turbo: https://developer.nvidia.com/blog/leveraging-hardware-jpeg-d...