HBM would be having it bandwidth reduced significantly to sync with the on system. Thats ignoring how the timings for everything would be ruined which is a pretty important and hard to manage thing for RAM.
You use HBM or GDDR if you want very high bandwidth, like top-end discrete GPUs would get. But then the memory itself is more expensive and you want the external channels to reduce cost, so the OS bloat can go in the cheap memory and preserve the limited amount of high cost memory for what needs it.
This is notably not what Apple does -- they're just using ordinary LPDDR5 with a wide bus, equivalent to having a lot of memory channels. It gets them several hundred GB/s worth of bandwidth, similar to a midrange discrete GPU. If you were going to do that, you could put most of the channels within the package and still have two of them outside of it.
That sort of configuration would allow some flexibility. The on-package memory might have lower latency (if they're both just ordinary DDR this isn't going to be much difference if any), but if you configured the system to only interleave between the on-package memory channels then the "close" memory could achieve that lower latency. Interleaving the external channels into the same pool would have a small latency hit but increase bandwidth by e.g. 25%. Which could be configured in UEFI based on your expected workload.
You use HBM or GDDR if you want very high bandwidth, like top-end discrete GPUs would get. But then the memory itself is more expensive and you want the external channels to reduce cost, so the OS bloat can go in the cheap memory and preserve the limited amount of high cost memory for what needs it.
This is notably not what Apple does -- they're just using ordinary LPDDR5 with a wide bus, equivalent to having a lot of memory channels. It gets them several hundred GB/s worth of bandwidth, similar to a midrange discrete GPU. If you were going to do that, you could put most of the channels within the package and still have two of them outside of it.
That sort of configuration would allow some flexibility. The on-package memory might have lower latency (if they're both just ordinary DDR this isn't going to be much difference if any), but if you configured the system to only interleave between the on-package memory channels then the "close" memory could achieve that lower latency. Interleaving the external channels into the same pool would have a small latency hit but increase bandwidth by e.g. 25%. Which could be configured in UEFI based on your expected workload.