Hacker News new | ask | show | jobs
by bonamiko 867 days ago
I am working on building out a better voice interface for LLMs.

It is still a work in progress (early beta), but you can check it out at https://www.bonamiko.com

Currently I have mainly been using it as a tandem conversation partner for a language I'm learning, but it can be used for many more things. As it is right now, you can use it to bounce ideas of, practice interviews, and help answer quick general questions. You just need to tell it what you want.

The stack is a Next.js application hosted on Vercel using Supabase for the backend. (There is also some plumbing in AWS for email and DNS.) It is automatically deployed via GitHub actions.

1 comments

Very cool, just signed up. What advantages does this have over the one built into the ChatGPT app? Also, it would be great if I could see the text output in addition to the voice.
The main differences fundamentally come down to OpenAI treating it more like a party trick demo, rather than a core functionality. I think it has a lot of potential if I can just fine tune a couple rough edges. (When you chat with someone in person, you don't pull out notebooks a write messages to each other. I see writing as a fallback medium.)

To answer your question more specifically,

Pro Bonamiko:

  - Faster average first response latency (but higher first audio latency since OpenAI uses a ding). This is the main focus currently, reducing latency as much as I can. I'd like to be able to avoid the ding, but we'll see how low I can get it.
  - Can be used anywhere with a browser, OpenAI requires a mobile app installed. (I.E. Desktop support)
  - In the future we can support deeper customization since we are focused on the audio medium. As soon as you have to run a function in the ChatGPT app there is a long response latency, which could easily be fixed by something as simple as the AI saying "Let me perform a search to get the details"
Pro ChatGPT:

  - Nice animation
  - Already has built in tool support such as web search
  - Supports language switching automatically between messages, Bonamiko requires manually changing the language