That whole theory is blown by the fact that CoreAudio supports 3rd party audio hardware. There is a thriving market of soundcards and audio peripherals for the professional Mac OS X market. Those all run CoreAudio, which is exactly what iOS runs.
This suggests that you can work focused on one driver and therefore save developer-time. However, the drivers on Android are made by a bigger workforce, which must be taken into account.
> they can skip one of the abstraction layers
The post explicitly states that the HAL ought to add no latency at all.
Besides the point. When you know the hardware you can shave away the upper layers.
Consider e.g. AmigaOS.
AmigaOS let you obtain a pointer directly to the screen bitmap to update your window contents with no buffering or clipping. It could do that because originally all the hardware was the same, or close enough.
Then graphics cards came along, and you didn't necessarily have a way of writing directly to the bitmap. Suddenly you had to use WritePixel() and ReadPixel() and similar, which would obtain the screen pointer for the window, and obtain the display the screen is on, and find the driver corresponding to the screen, and call the appropriate driver function via a jump table.
Similarly, the AmigaOS had functions to e.g. install copper lists (the copper was a very primitive co-processor that could be used to do things like change the palette at specific scan lines), which also wouldn't work at all on graphics cards.
This is why knowing the hardware is part of a limited set matters: You can define your API to match the hardware very precisely, or even expose hardware features directly.