| we tried to build something similar lately for outbound calls (for simple reminders to partners) and faced massive issues using gpt-4o-realtime-audio. Noise detection, turn detection, random telephony issues (we were using Twilio too), prompt not holding together, and more. We dropped the project because it would have resulted in a terrible experience for the person on the other side of the phone. Building these things is non trivial. The plan would have been to A/B test and see what the response would have been (watching NPS and business metrics uplift). Human handoff was always the plan in case things got too tricky for the LLM to handle. I see some hostility here towards this project and while I share many concerns, it is very naive to think that these services won’t be massively leveraged going forward. An AI agent can handle things as well as humans (not in our case but there are good services out there, i.e. Parloa) and the key elements are the same as all the other agentic based workflows: - narrow use cases - human in the loop ready to pick up/steer/correct we will see a lot more of
this and as LLM capabilities improve, it will only get better - it is inevitable at this point and might (_might_) result in a better experience for customers in some cases. Nevertheless I also see the possibility that we will go full circle and we will always reach for a human, maybe showing up in person in a physical office to make sure cases or requests are handled well… or not :-) |