Hacker News new | ask | show | jobs
by bravura 539 days ago
antirez, it's probably identical to the approach in this paper: Li et al 2024, "Evaluating Large Language Models for Generalization and Robustness via Data Compression" (https://ar5iv.labs.arxiv.org/html//2402.00861).

There's a pretty straight line from assigning probabilities (to a sequence of tokens) to arithmetic compression as an optimal compression algorithm for that distribution.