Cool! I like tiny encoders. I teach assembly programming at a university where one of the final assignments in the course is making a decoder for something similar to LZ77 in Cortex M3 (thumb 2) assembly. Best I could do is 15 instructions / 40 bytes found here: https://gist.github.com/ManDeJan/fd1c625e3540faa41d03736eb94... .
RegPack, a non-standard encoding, but an awesome approach for small chunks (1-4KByte is optimal) of self contained compressed code, using regular expressions for decoding: https://github.com/Siorki/RegPack
Crinkler [1] is a popular compressor-linker for 1--8 KB demos and its decompressor (partially overlapping with a PE header) is probably around 1--200 bytes. Later efforts like oneKpaq [2] also have a comparable decompressor size.
If you don't mind a shameless plug and a slightly larger decompressor (about 500 bytes in JS) for better compression, my Roadroller [3] might fit the bill as well.
The compression method in UPX is much simpler (therefore smaller and faster), but the last time I saw it was over 10 years ago, something may have changed. But LZMA should provide better compression.
I want to make the decoder small enough to be used in compressed executables. Where decompression performance doesn't matter because LZMA is far from fast decompression, and small code without inlining makes it worse (but bearable).
I have suggested them to look at my version, but I think UPX not much hardcore for that. Because I'm guessing decompression speed is more important to UPX than the extra 2kb.