Hacker News new | ask | show | jobs
by umaar 1171 days ago
Made something similar recently, but for WhatsApp: https://chatbling.net/

What behaviour would users prefer when uploading a voice message, a) the voice message is transcribed, so speech to text? Or b) the voice message is treated as a query, so you receive a text answer to your voice query?

I've done a) for now as mobile devices already let you type with your voice.

4 comments

I'd quite like a twilio script I could host that enables voice to voice with ChatGPT over a phone call, but for messaging apps (I'm gonna to try yours, though would prefer Signal) I'd personally prefer to stick with typing and use Apple's transcription (the default microphone on iOS keyboard) for any voice stuff - still wanting text back.

This is (in addition to the fact that Apple's works pretty well for me) mostly because that way I get to see the words appear as I'm speaking, and can fix any problems in real-time rather than waiting until I've finished leaving a voice note to find out it messed up. Bing AI chat, for example, trying to use their microphone button just leads to frustration as it regularly fails to understand me. But maybe Whisper is so good that I'd hardly ever need to care about errors?

I do suspect I'm an outlier in terms of how I use dictation, checking as I go - at least based on family members, they seem to either speak a sentence then look at it, or speak and then send without looking - so for them, off-device transcription would probably be welcome as long as it even slightly improves accuracy rates.

I see my server has restarted a few times! I imagine it's folks here since I haven't shared Chat Bling elsewhere yet. Sorry to anyone who started generating images, but haven't received a response. The 'jobs' for images generations are stored entirely within memory, so a server restart will lose all of that.

Going forward, I'll explore storing image jobs in redis or something, which will be more resilient to server crashes.

As for conversation history, I'll continue to keep that in memory for now (messages are evicted after a short time period, or if messages consume too many OpenAI tokens) - even that's lost during a server restart/crash. Feels like quite a big decision to store sensitive chat history in a persistent database, from a privacy standpoint.

You could have a default "will be wiped after <x time>" policy / notification up front, plus an option to change this (in either direction, one way to "only store this in RAM not the DB, and wipe it as soon as I close this window - or maybe after an hour of inactivity", the other way to "please never delete (we reserve the right to delete anyway but will keep for at least Y days/months/whatever)". And also a "delete now" button to override. And then a cron job checking what's due to be deleted and wiping them from the DB/memory?

Of course, it maybe also adds more pressure to keep the server more secure without private conversations being accessible after a reboot...

Agreed, giving the user a choice would be best here. Something tells me most users would not change it from whatever the default is, but yeah still good to expose this as a setting which should be doable. Thanks for the input!
Np - and you're probably right that I'm in the minority of people who'd care about having as much granular control as possible... maybe most people would rather something closer to a browser's privacy mode, so just a toggle on and off between very private and don't care about private?
How did you get Meta to approve? Been trying for so long.
This is very cool. I tested it with a quick reminder request and it seemed to work. I'm a bit terrified by the privacy issue though. Combining OpenAI with WhatsApp seems like a marriage made in hell.

I guess the only solution will be to move to local bots and models on the phone which will interface out only when needed.

dude how did you get Meta to approve your WA Business? I couldn't get verified after like two weeks of trying and gave up :(