Hacker News new | ask | show | jobs
by jpap 1957 days ago
This post covers a topic I've spent some time with in the past, and is generally a good overview but unfortunately gets the idea of "linear RGB" wrong. That means all of the results need some attention, including the Go implementation. Maybe for a part II post?

Each color value (e.g. red) is represented by a value from zero to full intensity. It's easiest to think of it as a number between 0 and 1 in a linear space. You could use a floating point number for that, or a quantized/fixed point value. For example the 10-bit quantized value round(r_linear*1023) in the range 0 to 0x3ff.

8-bit RGB color components are "encoded" from their linear version with a transfer curve (aka gamma compression). For sRGB, the curve is a piecewise linear and exponential combination. A good overview is [1]. There are many different encodings, including sRGB, BT.601, BT.709, etc. Then there's "full range" vs. "video range"... it can get complex pretty quickly.

Because of gamma encoding, an 8-bit R_sRGB red value is not equal to round(r_linear*255). You have to first compress r_linear via the gamma curve, then quantize that 0..1 value to 8-bits. When going in reverse (expanding an 8-bit sRGB value to linear), you generally take R_sRGB/255 to produce a value in the 0..1 range and then use the inverse gamma curve to get the linear value. These computations can be done in floating point, fixed point, or using lookup tables.

The takeaway is that you can't represent 8-bit sRGB color components in linear with just 8-bits, without losing precision. You need at least 12-bits for linear sRGB and many implementations just go straight for 32-bit float values for simplicity.

These conversions are required whenever you combine (blend) pixels encoded into sRGB: so for each pixel operation X, you decode sRGB to linear, perform X, then encode back to sRGB. It's expensive! That's why GPUs offer texture formats that specify a gamma encoding like sRGB, so a pixel shader can blissfully work in linear color, with the conversions done for it in hardware as a pre- and post-shader operation. On the CPU? You have to do it all yourself...

Because of that, many software libraries don't bother with the proper gamma conversion and just compute everything in the logarithmic (gamma encoded) domain. And most of the time, it looks OK! But it really is just a "cheap" approximation -- sometimes it can look quite bad compared to the (proper) linear computation...

As far as I can tell, none of the Go standard library does linear blending; and all of the image formats are assumed to be sRGB encoded. There are some 3rd party packages like [3] that can do some of color management on a 16-bit linear image format (RGBA64 == 16bits/component RGBA).

The other thing the author might consider is revising the "Why?" footnote to the "Random Noise (grayscale)" section. What the author is actually doing there is just using a cheap approximation to a rounding function: round(x) ~= floor(x + 0.5). In general, doing a round like that introduces a bias [2]. That section can be summarized as: after every pixel operation, round and clamp back to the valid range.

[1] https://blog.johnnovak.net/2016/09/21/what-every-coder-shoul... [2] http://www.cplusplus.com/articles/1UCRko23/ [3] https://github.com/mandykoh/prism

4 comments

Thanks a lot for this, I was hoping to learn with this blog post. Correctness is the highest priority for me, so I'm glad to improve my library.

I will update the library to use 16-bit color everywhere (0-65535), and update the blog post to note this.

As for the rounding, that's another great point, and thanks for the link. I will change the library and blog post to round to the even number on ties.

Edit: I've updated the blog post, I'd appreciate if you could check it out and let me know if I made any mistakes with the update.

Just a small correction: Logarithmic != gamma encoded. Most software use sRGB encoding which is close to a power function (often referred to as a gamma function). Logarithmic encoding is often used for encoding HDR images, but is not what most software use.
When I referred to "logarithmic domain", I'm not talking about a purely log/exp transfer function, but one that is "log like". Perhaps it is more accurate to say "non-linear domain"... but I hope you get the idea. :)

The sRGB transfer function is piecewise linear + exponential but can be closely approximated by a simple exponential with \gamma ~= 2.2 [1]. Either way, the encoding between linear and non-linear is generally referred to as "gamma correction", even when using a transfer function that is not a simple exponential.

[1] https://en.wikipedia.org/wiki/SRGB#The_forward_transformatio...

sRGB was engineered to be indistinguishable from a power 2.2 function, even though it's harder to calculate.
Even if the article gets the "linear RGB" idea wrong, it doesn't matter for the results, because each channel in all explored palettes is still 1-bit: either on, or off, sometimes with the constraint that red and green cannot be on at the same time.
Unfortunately not when you're performing error diffusion, where the residuals are added to neighboring pixels. If you're doing it in the non-linear domain, you're going to get a different result during that diffusion step, even when you're dithering to a target 1-bit/pixel image.

You can see this visually in Surma's excellent blog post [1]: look for the gradient strips in the "Gamma" section.

[1] https://surma.dev/things/ditherpunk/

Good point, perhaps I should've picked some different palettes. But the ideas behind the blog post generalize to all palettes.
> none of the Go standard library does linear blending;

I would understand a C++ library from the 1990s getting this wrong, or some toy project not bothering to implement colour management properly.

But to develop a new programming language for the 2010s to 2020s and blithely assume that images are always 8-bit sRGB is lazy beyond belief...

To put things in perspective, this would be roughly the same as making an application around the same time that simply assumes that the screen resolution is a fixed 1024x768 pixels.