You can use LLMs as compressors, and I wonder how it would go with that.
The approach is simple: Turn the file into a stream of tokens. For each token, ask a language model to generate the full set of predictions based on context, and sort based on likelihood. Look where the actual token appears in the sorted list. Low entropy symbols will be near the start of the list, and high entropy tokens near the end.
I suspect most language models would deal with your alphabet example just fine, while still correctly spotting passwords and API keys. It would be a fun experiment to try!
I mean, it certainly has a low Kolmogorov complexity (which is what I would really want to be measuring somehow for this tool... note that I am not claiming that is possible: just an ideal); I am unsure whether how that affects the related bounds on Shannon entropy, though.
A quick Google search suggests English has about 10 bits of entropy per word. Having a long password like that can still have high total entropy I suppose, but it has a low entropy density.
Maybe 10 bits is the average over the dictionary – which is what matters here, but over normal text it is significantly less. Our best current estimation for relatively high-level text (texts published by the EU) is 6 bits per word[1].
However, as our methods of predicting text improve, this number is revised down. LLMs ought to have made a serious dent in it, but I haven't looked up any newer results.
Anyway, all of this to say is that which words are chosen matters, but how they are put together matters perhaps more.
The diceware method is supposed to generate totally random words, so it should fundamentally be unpredictable unless there's a flaw in the source of randomness.