Yes, most smartphones actually have two CPUs - the one running the OS and doing web browsing and stuff, which runs some monolithic kernel, and the one driving the radio chip, which runs a microkernel. They don't have much to do with each other.
I think the main reason for the separation is the realtime constraints needed by radio codecs - if your web browsing slows down because you received an email, it's a minor annoyance, but if you stop sending radio packets because you received an email, you probably lose the connection or worse.
Many kernels can give the necessary hard real time guarantees today. Honestly, I think that baseband chips run their own operating systems because it's illegal to sell open software radios in a lot of countries, and a cellphone where you could tweak the code running the baseband chip would essentially be one.
The radio CPU can be optimized for its job, so it may have extra instructions for error detection, decryption, etc. It may also be on a separate bus with the antenna peripherals, freeing up a lot of bandwidth for the main CPU.
It should be possible to do all this on a single CPU (real-time is no problem - just use an RTOS), but it would be expensive and eat a lot of power.
The other reason they split up the baseband and the main CPU is for regulatory reasons - the baseband is the only part talking to the radio, so it's the only part which needs to be thoroughly tested for regulatory compliance. This then lets them upgrade other parts more frequently without having to go through as much testing and obtaining approvals.
I'd say that's like arguing that "My language is better than yours at doing 'Hello, world!.'" This isn't a hard problem and even poor solutions can solve easy problems.
I think the main reason for the separation is the realtime constraints needed by radio codecs - if your web browsing slows down because you received an email, it's a minor annoyance, but if you stop sending radio packets because you received an email, you probably lose the connection or worse.