Hacker News new | ask | show | jobs
by foobarqux 727 days ago
First issue is that if you do comparisons you have to do them apples-to-apples, you can't assume that method A has that information available for free but method B does not or that method A assumes a given source model but method B assumes a different source model.

Second, I really don't understand how you intend to use a table of symbol counts: If you do it over the entire file the table might be a reasonable size but the number of permutations becomes infeasible. Conversely if you do it in small windows (like 8 or so in your examples) you have to store a separate symbol count table for each window which would explode the symbol count table. I really doubt you are gaining anything from doing this. You are going to create an enormous per-file symbol frequency table and then not count it against the compressed size, that isn't compression it's just misdirection.

1 comments

When I get time may compress the table as well then, in looking at example arithmetic encoders they seem to have some flat table implementations as well (https://github.com/nayuki/Reference-arithmetic-coding/blob/m...). I don't see how that's misdirection, I left the table uncompressed specifically for transparency reasons. Was just trying to keep that part simple.