Hacker News new | ask | show | jobs
by dist-epoch 65 days ago
For those wondering where is this practically relevant - this is the basic metric used to compare quantization of various LLM models - what is the KL divergence of a 4-bit quantization versus an 8 bit one versus the original 16 bit one.
1 comments

This is also the original way variational methods pick a parameterization of a model of known architecture which best matches some distribution which generated data but is not otherwise compactly expressible.