|
|
|
|
|
by andreasvc
2756 days ago
|
|
Why do you think English would be least compressible? Is that based on conjecture or have you investigated this? Why would artificial language be more compressible? That seems completely orthogonal to me (by definition, an artificial language can be designed with whatever properties you choose). Fortran may be more compressible due to its limited set of keywords, but it's my impression that Ithkuil is by design more information dense and thus harder to compress than English. The most efficient language is the least compressible language only in a narrow and arbitrary sense of efficient. There are many considerations such as what is efficient for the speaker, the hearer, redundancy to noise, efficiency with respect to particular purposes, etc. We can assume that natural languages will generally make a good trade-off across these factors, and searching for the most efficient language in one particular narrow sense is not very useful. Moreover, compression of text focuses only on surface form, completely ignoring the dimension of meaning. |
|
My conjecture is that artificial languages will be more compressible because they haven't had time to get honed down, like English losing "thee" and "thou", that personal mode of address. Esperanto and Loglan are completely regular, which natural languages are not, and thus has a lot of use-cases where the regularity doesn't matter - they haven't had time to lose the mostly-unused features.
For better or for worse, compression of text only uses the surface form to compress, because that's the level that compression works on - letters or bytes or some other unit. You can't compress meaning. Meaning doesn't exist per se: colorless dreams sleep furiously, after all. That is, you can use perfectly sensible words and letters and even legitimate syntax, and still create strings devoid of meaning. A document consisting of perfectly spelled words, and legitimate syntax, yet without meaning like the colorless dreams sentence, will compress identically to ordinary text with the same orthographic and syntactical validity.