Hacker News new | ask | show | jobs
by tkgally 823 days ago
Talking with an LLM feels very different to me from text-based chat interactions.

I used the spoken interface with ChatGPT 4 a lot a few months ago after it was released on the iPhone app, and it was pretty immersive. The latency was a bit long, though, and even when prompted to reply briefly the bot tended to ramble on, often with numbered lists, which sound awkward in speech.

For the past couple of weeks, I’ve been experimenting with Inflection AI’s Pi. Its voices are very natural—the American female voice I use even has vocal fry [1]—and the latency is short. It will talk about serious topics (sometimes with numbered lists), but it seems prompted mainly for friendly conversation. It calls me by my name and remembers our previous conversations. I can easily see people becoming emotionally attached to bots like that.

A man named Chris Cappetta has created some open-source software for talking with Claude 3. His conversations with the bot about AI are pretty remarkable [2, 3].

The current spoken interfaces all seem to run what the user says through a speech-to-text converter, so the bot does not perceive pronunciation, intonation, hesitation, etc. After multimodal models that can hear and respond to the speaker’s tone become available, the experience will become even stickier.

[1] https://en.wikipedia.org/wiki/Vocal_fry_register

[2] https://www.youtube.com/watch?v=fVab674FGLI

[3] https://www.youtube.com/watch?v=gY9-1isnARs