|
|
|
|
|
by foobarqux
724 days ago
|
|
I don't think you understand the basic concepts and you are wasting your time trying to "fix" it. Let me try to explain a bit differently: The Shannon limit depends on assuming a model of the source (the thing producing symbols). For example imagine the output of something that you think is "true" noise like a fair coin flip that produces 0 (for heads) and 1 (for tails). That has entropy of 1 bit and you cannot compress it. Now you learn that it is actually a pseudorandom number generator that produces 0 or 1 in a 1:1 ratio. If you know the program (and seed if required) the source now has 0 entropy (you can run the program to know every bit that will be produced). Same symbol outputs, different models of how they are produced, different theoretical compression limits. As another example if you are compressing text one model would be to assume that words are produced independently of one another. We know that that isn't true (certain words are more likely to follow other words like your next word prediction system on your phone works) but you might still choose to implement a compression system based on that naive model because it's less complex, uses less memory or compression/decompression time etc. If you implemented compression using a less naive model (e.g. what is the probability of the next symbol in all human text given all the symbols I have seen before this one) you could get better compression rates but you wouldn't be breaking the Shannon limit that you calculated using the naive model because the "true" limit is based on the "true" model of whatever the entropy of human produced text is, which is not really calculable (people produce new sentences every day) but you might approximate using something like all human text you can find. As I alluded to above note that Shannon entropy doesn't take into consideration practical complexities (memory, program size, processing time) that are real considerations when implementing a compression scheme. Most of the time you are going to trade higher entropy for simpler, faster, less resource intensive compression. To sum up: You cannot beat the Shannon limit of the "true" model of the source but you can beat the Shannon limit of a naive approximation of the source. Those naive models are used because they are more practical to implement. |
|