Hacker News new | ask | show | jobs
by tarblog 3927 days ago
In the spec [1], they published the dictionary used in hexadecimal form. Can someone explain why there are so many 6's in the data?

For comparison here's a chart of hex character and approximate [2] number of occurrences.

(0 - 15k) (1 - 10k) (2 - 19k) (3 - 14k) (4 - 15k) (5 - 16k) (6 - 62k) (7 - 32k) (8 - 10k) (9 - 11k) (a - 11k) (b - 7k) (c - 8k) (d - 12k) (e - 20k) (f - 9k)

[1]: http://www.ietf.org/id/draft-alakuijala-brotli-05.txt [2]: counted with find-in-page, didn't bother to only search the dict

1 comments

The 6s and 7s correspond to ASCII lowercase letters. Try decoding the hex strings as UTF-8...