|
|
|
|
|
by daemonologist
739 days ago
|
|
Entropy of information is basically how well it can be compressed. Random noise usually doesn't compress much at all and thus has high entropy, whereas written natural language can usually be compressed quite a bit. Since many passwords and tokens will be randomly generated or at least nonsense, looking for high entropy might pick up on them. This package seems to be measuring entropy by counting the occurrences of each character in each line, and ranking lines with a high proportion of repeated characters as having low entropy. I don't know how closely this corresponds with the precise definition.
Source: https://github.com/EwenQuim/entropy/blob/f7543efe130cfbb5f0a... More: https://en.wikipedia.org/wiki/Entropy_(information_theory) |
|
And it fails for passphrases like 'correct battery horse staple', which have a large enough total entropy to be good passwords, but have a low entropy per character.