Why write another article on JPEG when there are already hundreds of articles on the internet? Well, normally when you read articles on JPEG, the author just gives you details about what the format looks like. You don’t implement any code to do the actual decompression and decoding. Even if you do write code, it is in C/C++ and not accessible to a wide group of people. I tried to change that through this article. I give you a guided tour of JPEG encoding/decoding process and show you how it can be implemented in Python3.
I mainly focus on decoding baseline encoded JPEG images.
It's great to see more people complete the "JPEG decoder challenge" --- and document it too. Have you read the official T.81 spec[1]? It is one of the easier standards to read, and it has flowcharts of all the algorithms which mean you can implement without really understanding the theory.
As you mentioned there are lots of other articles about writing JPEG decoders, as well as for the other popular image formats GIF and PNG. What I haven't seen much of if at all, however, are articles about video decoding; if you're interested to try another challenge, I recommend a decoder for H.261: https://www.itu.int/rec/T-REC-H.261-199303-I/en
(I've written one --- in C --- and it is actually far simpler than JPEG in many ways, although it shares many similarities. A H.261 decoder in Python, with associated article, would definitely be very interesting to see, and AFAIK also a world-first!)
Thanks for the H.261 decoder suggestion! I will actually seriously consider it because I am personally curious how video decoders work as well. I will make sure to document the process if I end up working on it.
As someone who has done it in C, I would both agree and disagree --- reading the file format will definitely be easier since C lets you pick bits/bytes/words/etc. off the stream directly, but on the other hand the high-level structures (looping, etc.) would probably be easier with Python.
Overall, seeing as OP's Python implementation is less than 300 LoC while my C implementation was closer to 750, the Python might be easier. Then again, I followed the algorithms/flowcharts in the spec, which do a faster table-based Huffman decoding than the bit-by-bit of this Python version, and also implemented arbitrary(!) chroma subsampling as the spec allows, so it's not a direct comparison.
The main benefit I see here of Python over C is when your program doesn't work (yet): Python has nicer error handling built straight into the language.
If you just want to present a working implementation, both C and Python can express readable versions.
Easier for whom? Most programmers would not be able to follow that code. On the other hand, a really simple C or Python algorithm is more or less universally readable.
It's not so much about the algorithm, but the parsing itself.
Parser combinators give you one of the most straightforward ways to express parsers.
You are right, that the language you express your parser combinators in vs the language you express an alternative solution in also will have an impact.
You can in theory express parser combinators in Python. But it's gonna be a bit ugly, because Python has relatively clunky syntax for functions.
I just found a parser combinator library for Python https://pypi.org/project/parsita/ But it looks like they have to abuse some Python magic to make the code readable.
I think being executable is really key, because one way I check my understanding of code is to make modifications to the code and see if they do what I expect them to do.
IMO the ideal setup would be for an article to contain pseudocode and to have supplementary executable code. This would allow the article to explain concepts without boilerplate and unnecessary details, but also ensure that those details are available for readers who wish to investigate further.
I still think it's an advantage to write the executable code as close to pseudocode as possible.
I really really like this - especially with all the references you've listed. I think more programmers should have a go at working with various file formats. Well done on this article.
Thanks! And I completely agree with the idea that more programmers should have a go at working with various file formats. I learned so much about binary file layouts through this one project.
Before writing this article I had no idea that JPEGs use Huffman tables. Now all those Huffman table lectures from college algos classes make sense. I was never told in that class that this is a major use case for Huffman tables. Maybe I would have shown more interest then if I knew this fact.
This. I know C, C++ implementations are efficient but to see python implementation of this makes it lot easier to understand.
> You don’t implement any code to do the actual decompression and decoding
This is also true in case of face recognition. Nobody tells you internals and just tells you to use tools like OpenCV. Maybe it's too complicated but a post like OP's for face recognition would be a treat!
It's not that far off going for an MPEG-2 video decoder next, highly recommend (a long time since I did, so take with a pinch of salt). That would be great to see in python to make that realm more accessible.
Equally there are millions who don't use C or C++. I think it’s pretty fair to describe this as a "wide group of people" who would consider C or C++ inaccessible.
For any given programming language, the number of people not using it outweighs the number of people using it by several orders of magnitude.
It’s hard to get accurate numbers, but a quick google estimates there is roughly double the number of Python programmers than C++ programmers. I find it very difficult to imagine there isn’t some overlap there and even more difficult to imagine a competent Python programmer looking at C++ code and being completely unable to read it.
There is definitely a substantial number of people who know Python and have had little to no exposure to C or C++. To the point where they'd really rather see something like this presented in a language they're familiar with and that they can try out in an environment they've already got setup.
There's no harm in having many articles on a given subject (think how many monad posts there are) - if you dislike one you don't have to read it, there's plenty of other interesting stuff out there :-)
Thanks for making this, a couple of months ago I was working on a jpeg project and needed to understand how it works. I was searching wildly for a tutorial with a python implementation, unfortunately never found one. I found this one in C++ though, it goes a lot more in depth https://www.youtube.com/playlist?list=PLpsTn9TA_Q8VMDyOPrDKm...
Thank you for this; lots of details, references and detailed diagrams!
One small note - the first image you mention that we'll need to reference is a bit small to read. Zooming in on the page does not zoom it in due to format of the page. I had to right click -> View image, in a separate tab in order to be able to zoom in enough to read the righthand side of it.
edit: this is on the latest version of Firefox
It always puzzles me that the quantisation matrices don't have symmetry along the diagonal. I guess it suggests the distribution of vertical and horizontal lines in typical pictures differs.
Why write another article on JPEG when there are already hundreds of articles on the internet? Well, normally when you read articles on JPEG, the author just gives you details about what the format looks like. You don’t implement any code to do the actual decompression and decoding. Even if you do write code, it is in C/C++ and not accessible to a wide group of people. I tried to change that through this article. I give you a guided tour of JPEG encoding/decoding process and show you how it can be implemented in Python3.
I mainly focus on decoding baseline encoded JPEG images.