Telephone hold music is never going to sound that good, at least not if you are listening to it through the narrow bandwidth of a analog phone line or the highly compressed signals used for cellular and VoIP.
The codecs and sampling rates used for phone communication are optimised for the usual range (E2 to C4 depending on voice type) and overtones of the human speaking voice. The usual sampling rate for phone lines is 8 kHz, which means that only frequencies up to 4 kHz are captured. This is sufficient for making vocal communication intelligible (if not exactly aesthetically pleasing).
Music typically goes beyond that range. When transferring music over a phone line overtones in particular are cut off, which makes the music sound flat and creaky.
What I also notice in Australia is when that music plays the volume slowly tends to zero. I worry that I've lost reception, but when they pick up its fine. I also notice the phone seems to "know" its on hold as there is an audible cue, so perhaps the carrier is deprioritising on hold audio.
A better question is why telephone quality is bad in general. I assume it's a bandwidth issue related to the technological limitation when it was rolled out.
Whenever I do a FaceTime voice call I'm always shocked by the quality of the audio.
IIRC, the cell phone voice codecs are at an absurdly low bitrate (1.2kbps?) and highly optimized for human voice only, so voice is acceptable and everything else sounds like trash.
Analog landlines used 64kbps codecs (after the analog loop to your house) that were flexible enough to be abused to transport 56kbps of data.
the reason there is hold music at all is to let you know the phone didnt disconnect. Cheap sound fulfills this so why spend money doing anything better
The problem is that unlike the Internet which is lossless, the telephone network is lossy and there is no standard as to how much loss is acceptable.
Even if all your equipment and your telco's equipment is perfect (which it isn't), all it takes is one single weak link in the call chain (from a cheap carrier somewhere) to completely slaughter the audio quality by passing it through some bad equipment or even analog equipment.
The telco industry is the total opposite of the Internet industry - in the net most of the technologies, standards, etc are open, there is mostly a culture about sharing and openness, and while proprietary solutions exist, they are often much better as far as complying with standards goes, and open-source solutions are competitive as well (any Linux or BSD box can be used as a router).
The telco world is the opposite, lack of openness, the standards, if they exist, are often under NDAs, lots of security by obscurity, a lot of scammers selling proprietary gear ("magic boxes" as I call them) that respect the standards to the bare minimum or sometimes don't even try (which leads to lots of fun when two different magic boxes both claim to support the standard and yet fail in mysterious ways when trying to interoperate). An example would be a magic box that only supports G711 (a crappy low-bitrate codec), so even if the call chain is full IP and all the other boxes support better codecs (or even yet, just handle SIP singling and let the peers talk directly via IP), the call quality would be limited by that one magic box that insists on G711. The lack of openness and knowledge prevents people from making an informed decision and they have no choice but to trust the salesman that their magic box is best. Open source solutions aren't up to scratch either, Asterisk (and derivatives) is a start in the right direction but frankly doesn't really scale and has problems when you try to make it highly-available.
I wish the telco world would just die and we move to SIP with media directly over IP, that would eliminate all those problems. SIP is already used within VoLTE and Wi-Fi calling but a magic box within the carrier's network (the P-CSCF as they call it) will often still insist on re-encoding the media stream instead of letting the packets flow between the two phones directly, so you're still at the mercy of whatever codecs that box supports. Calling outside your carrier will involve another magic box that will bridge your SIP call to the other carrier's network, via legacy garbage E1/T1 links with crappy codecs (instead of you know, just using SIP and letting the packets flow directly).
If you have good equipment throughout the entire call chain, that ideally does not interfere with the media itself (so the RTP stream is direct between the two phones), and the phones both support a good codec (Opus?), you could have MP3-quality calls.