|
|
|
|
|
by kqr
738 days ago
|
|
Maybe 10 bits is the average over the dictionary – which is what matters here, but over normal text it is significantly less. Our best current estimation for relatively high-level text (texts published by the EU) is 6 bits per word[1]. However, as our methods of predicting text improve, this number is revised down. LLMs ought to have made a serious dent in it, but I haven't looked up any newer results. Anyway, all of this to say is that which words are chosen matters, but how they are put together matters perhaps more. [1]: http://arxiv.org/pdf/1606.06996 |
|