| It's worse than that; the input itself is only 8-bit and they scale up, and then back down again to 8-bit on the output side. But it works, for the same reason that audio engineers use 64-bit signal pipelines inside their DAW even though nearly all output equipment (and much input equipment) is 16-bit which is already at the limit of human perception. If you have a 16-bit signal path, then every device on the signal path gets 16 bits of input and 16-bits of output. So every device in the path rounds to the nearest 16-bit value, which has an error of +/- 0.5 per device. However if you do a lot of those back to back they accumulate. If you have 32 steps in your signal path and each is +/- 0.5 then the total is +/- 16. Ideally some of them will cancel out, but in the worst case it's actually off by 16. "off by 16" is equivalent to "off by lg2(16)=4 bits". So now you don't have 16-bit audio, you have 12-bit audio, because 4 of the bits are junk. And 12-bit audio is no longer outside the limit of human perception. Instead if you do all the math at 64-bit you still have 4 bits of error but they're way over in bits 60-64 where nobody can ever hear them. Then you chop down to 16-bit at the very end and the quality is better. You can have a suuuuuuuper long signal path that accumulates 16 or 32 or 48 bits of error and nobody notices because you still have 16 good bits. tl;dr rounding errors accumulate inside the encoder |
Nitpick: 16-bit fixed point is not at the limit of human perception. It's close, but I think 18-bit is required for fixed point. Floating point is a different issue.
> If you have 32 steps in your signal path and each is +/- 0.5 then the total is +/- 16.
Uncorrelated error doesn't accumulate like that. It accumulates as RSS (root of sum of squares). So, sqrt(32 * (.5 * .5)) which is about 2.82 (about 1-2 bits).
> You can have a suuuuuuuper long signal path that accumulates 16 or 32 or 48 bits of error and nobody notices because you still have 16 good bits.
Generally the thing which causes audible errors are effects like reverb, delay, phasors, compressors, etc. These are non-linear effects and consequently error can multiply and wind up in the audible range. Because error accumulates as RSS, it's really hard to get error to additively appear in the audible range.
tl;dr recording engineers like to play with non-linear effects which can eat up all your bits