Hacker News new | ask | show | jobs
by derf_ 810 days ago
That was a nice read.

> I also tested only one background image, so the behaviour may differ greatly with QR codes contained in different surrounds.

This likely does not matter much. It could theoretically affect binarization near the edges of the code (near module boundaries, depending on how you did the resizing), but in practice as long as the code itself is high-contrast, this is unlikely. The more usual issue is that real images often do not have a proper quiet zone around the code, but that is mostly going to be irrelevant for what you are trying to test here.

> The QR codes are generated to be perfectly rectangular and aligned to the image pixel grid, which is unlikely to happen in the real world.

This is a much bigger deal. A large source of decoding errors for larger versions (for a fixed "field of view") is due to alignment / sampling issues. A lot of work goes into trying to find where the code is in the image and identify the grid pattern, and that is just inherently less reliable for larger versions, particularly if there is projective distortion (so the module size is not constant). The periodic alignment patterns try to keep the number of parameters that can be used to fit this grid roughly constant relative to the number of modules in the grid, but locating those patterns is itself error-prone and subject to false positives (they are not nearly as unique-looking as finder patterns), and the initial global transform estimate has to get pretty close for them to work. I am actually happy that damaging these was not causing you more trouble. This is definitely somewhere that ZBar can be improved. It currently does not use the timing patterns at all, for example. I'm not actually aware of an open-source QR decoder that does.

(I'm the original author of ZBar's QR decoder)

1 comments

Thanks for the kind words and the insight!
A slightly more common way to express "field of view" is "module pitch", measured in pixels between adjacent module centers. I went back and tried to express the numbers from Figured 6 as a module pitch, and I think it works out to around 1.6 pixels / module. IIRC, the QR code standard recommends a module pitch of at least 4 pixels. So it is nice that ZBar is able to do around 2.5x better before running into issues (a margin that is a lot bigger than the gains from higher EC levels).

In theory there could still be room for improvement. Right now ZBar estimates finder and alignment pattern locations to quarter-pel precision, but rounds each module location to the nearest pixel so it can sample a binarized version of the image to decide the value of that module. At the extreme limits of small module pitch this effectively turns the resampling filter in whatever you are using to resize your QR code image into a nearest neighbor filter. You can see why that would start to cause issues. Imagine a version 7 code (45x45 modules) sized to be 80x80 pixels. Most of your columns will be 2 pixels wide, but with binarization, somewhere in there you have to have 10 columns that are only 1 pixel wide. Good luck lining up your grid to hit all of them perfectly (without looking at the timing pattern, which would likely only help in the perfectly axis-aligned case). Some kind of sub-pixel integration of the original image before thresholding to decide each module value could probably do better. That would make decoding a lot more computationally expensive, though.