Suppose this was true, an extremely small language model would outperform small language models. Sounds like inventing a compression which compresses any input by at least one byte.
Sorry for the snark, but the linked article doesn't explain why this obvious inference is false in this case.
Sorry for the snark, but the linked article doesn't explain why this obvious inference is false in this case.