| HN Mirror

First issue is that if you do comparisons you have to do them apples-to-apples, you can't assume that method A has that information available for free but method B does not or that method A assumes a given source model but method B assumes a different source model.

Second, I really don't understand how you intend to use a table of symbol counts: If you do it over the entire file the table might be a reasonable size but the number of permutations becomes infeasible. Conversely if you do it in small windows (like 8 or so in your examples) you have to store a separate symbol count table for each window which would explode the symbol count table. I really doubt you are gaining anything from doing this. You are going to create an enormous per-file symbol frequency table and then not count it against the compressed size, that isn't compression it's just misdirection.