Hacker News new | ask | show | jobs
by tonic_section 2099 days ago
During training, you can set a target bitrate by heavily penalizing examples which exceed the target rate in the rate-distortion objective - so the model should learn to produce compressed representations at or below this bitrate. However, this constraint is only enforced on aggregate throughout the entire dataset - like many ML systems, there is no guarantee of behaviour for individual examples, either within or outside the training set. Despite this, the model appears to respect the target rate well, even on out-of-sample images.

One shortcoming is that this current model is non-adaptive - which means that the target rate is fixed. So to achieve different target compression rates you would have to train multiple models in different rate regimes. In the Colab demo there is the option to select between 3 different models trained with a target bits-per-pixel (bpp) rate at 0.14bpp, 0.30bpp, and 0.45bpp, respectively - higher rates correspond to more higher-fidelity reconstructions, at the expense of a lower compression ratio. The default is the `HiFIC-med` model (and this is what the all samples in the README were generated with), but the model trained at the highest bitrate should have less obvious imperfections.

There's also an aspect to the distortion that can be attributed to the entropy coding process rather than the model itself - currently the system clips values outside a certain probability range, resulting in artificial distortion - a fix is in the pipeline though.