It's the combination of wireless packets dropping and the audio having to be degraded to 8bit/8kHz to pass between carriers that really brings on the terrible. It's kind of like TCP over TCP [1] -- works well enough a lot of the time, but when it fails, it fails badly because of how the two layers aren't really built for each other.
Compression largely, EVRC and GSM (CDMA and GSM respectively) are pretty heavily compressed, while I prefer the sound of EVRC, both are compressed - another thing to point out is most smartphones are pretty lousy telephones, bad earpieces, poor microphones, and so on. If you want to see how cellular handset can perform, use a good communications grade handset.
[1]: http://sites.inka.de/bigred/devel/tcp-tcp.html