Seems like a slightly unfair comparison. Training the compressor moves data from the images into the compressor, making the bit per pixel evaluation slightly more iffy.
Is this image compression tool good at images it was not trained on?
How bad does it get in those situations?
Is this training data fixed into the codex forever? Will there be slightly different image codexs that have different training data? That would be sort of hellish.
As long as the decompressor needs just an image file and no other data, it's a fair game.