Hacker News new | ask | show | jobs
by Retr0id 1246 days ago
I agree with the "usual" answer, or more generally, "the layer above". We shouldn't expect every file format to roll its own error detection.

If you truly care about detecting bit-flips in a writer process's memory, that's a very niche use-case - and maybe you should wrap your files in PAR2 (or even just a .zip in store mode!).

99% of in-the-wild PNGs are checksummed or cryptographically signed at a layer above the file format (e.g. as part of a signed software package, or served over SSL).

Edit: Furthermore, the PNG image data is already checksummed as part of zlib (with the slightly weaker adler32 checksum), so the second layer of checksumming is mostly redundant.

1 comments

> We shouldn't expect every file format to roll its own error detection.

On the other hand, why not? If you are dealing with files that are usually 200kB+, putting 4 or 16 bytes towards a checksum is not a big deal and can help in some unusual situations. Even if the decoder ignores it for speed, the cost is very low.

The space cost is negligible, but the time cost for the encoder is real. Since most decoders do verify checksums, you can't just skip it. Take fpng[1] as an example, which tries to push the boundaries of PNG encode speed.

> The above benchmarks were made before SSE adler32/crc32 functions were added to the encoder. With 24bpp images and MSVC2022 the encoder is now around 15% faster.

I can't see the total percentage cost of checksums mentioned anywhere on the page, but we can infer that it's at least 15% of the overall CPU time, on platforms without accelerated checksum implementations.

[1] https://github.com/richgel999/fpng

I didn't infer 15% from the way it was written there. But most platforms these days have some form of CRC32 "acceleration". Adler32 is easy to compute so I'm even less concerned there.

I spent a bunch of time optimising the code in fpnge (https://github.com/veluca93/fpnge), which is often notably faster than fpng (https://github.com/nigeltao/qoir/blob/5671f584dcf84ddb71e28d...), yet checksum time is basically negligible.

Having said that, the double-checksum aspect of PNG does feel unnecessary.

Does 15% more time to encode matter? How much time is spent encoding files vs decoding? That is probably still an negligible amount of compute, out of the total compute spent on PNGs.

Your specific number seem to come from an (old version of) an encoder that has super-optimized encode and not (yet) optimized CRC.

If the 15% didn't matter, the optimization wouldn't have been made.