(Good) podcasts are not recorded at anything approaching the normal limits of understanding. They're paced for good conversation or storytelling and know that the vast majority of their listeners are multitasking, not giving them their full attention.
I'd say less transmitting, and more preparation of the message.
"Please move" or "Get the fuck out of the goddamn way" both communicate the same information, one a bit more colorfully.
establishing and maintaining context, desired action and desired outcome take (well, me anyway) a substantial amount of time. Partly (for me) figuring out what the desired outcome actually is, and partly encoding that in a way that will be well received.
Yeah, the sender-side is probably the main bottleneck, just consider how often people speak filler phonemes like "uhm", it's so common that you might not notice unless you're looking for them. They are basically placeholders into the data-stream, to indicate that it isn't over yet but there's a delay producing the next item.
In contrast, consider a listener who is equally focused and invested as the producer: They don't often indicate that their own buffer is full or request a repeat. While you may say "hold up" or "run that by me again", it's usually for reasons other than word-rate. (For example, to prompt the producer to try another encoding, to express disbelief or contempt, etc.)