Yeah, I feel like this needs to grab a screenshot from a security camera first, or perhaps ask the person why they are there and auto-transcribe the message to Slack with their request.
I think they would definitely also want to send ~5 seconds of the plain audio. Freely available speech to text probably wouldn't recognize local business names. But either way it's pretty inconvenient to ring a doorbell and wait ~30 seconds. You never realize how valuable a receptionist is until you don't have one.