Hacker News new | ask | show | jobs
by semiinfinitely 129 days ago
an LLM can be used to losslessly compress a string to a size equal to the number of bits of entropy of next token prediction loss over the string, by encoding the extra bits of entropy with arithmetic encoding. its sota compression for the distribution of string found on the internet

an insightful video on the topic: https://www.youtube.com/watch?v=dO4TPJkeaaU