Hacker News new | ask | show | jobs
by tikkun 1008 days ago
I'm interested in the same thing and have spent quite a bit of time looking.

Rewind.ai is ok (transcription accuracy is meh)

Voice Memos.app is ok (though no native transcription, and requires stopping and starting)

Otter.ai is ok (though there's a 4 hour limit on recordings, and there's no paid plan that allows for enough recording minutes to do 24/7)

My ideal solution would be that Otter comes out with a Pro 24/7 plan with 60,000 minutes per month and no max recording length, for $60-80/mo.

I would pay for this and have paid for alternatives, though I'd prefer to use an existing company that I've used for a while and that has lots of users, due to privacy/trust, or perhaps a small startup that publishes security reports and does everything on device.

As an aside:

I use 24/7 voice transcription as a kind of "extended context window" (to use an LLM analogy). While I'm working, I talk out loud to myself about what I'm thinking through, which I find allows me to effectively increase my working memory size to be much larger than otherwise. It's quite helpful.

2 comments

Do you think an open-source solution that only uses Deepgram API and does not store any recordings would satisfy your privacy requirements?

How many hours per day or month do you actively use speech recognition?

60,000 minutes per month. I had to double-check my calculations. It seems you've found a 30th hour in your day.

Let me give you some context:

I saw your blog post about Deepgram. They charge $0.0059 per minute for pay-as-you-go.

- If you use it 24/7, it costs:

    - $8.496 per day

    - $254.88 per month
- If you use it 8 hours a day (with voice activity detection), it costs:

    - $2.832 per day

    - $84.96 per month
I know the 24/7 cost is too high for your budget ($60-80 a month). But voice activity detection can save you a lot of money.

About privacy and trust, open-sourcing the solution might give you some confidence. Deepgram is backed by YC and has many users, which might also make you feel better.

Out of curiosity, what do you then do with the transcriptions? You said you typically talk out loud while working, but do you continue working like normal after the transcription is recorded for later use, or do you interrupt your workflow to do something specific with the transcription immediately after?
> Out of curiosity, what do you do with the transcriptions after you record them?

I use them as a dictation tool. I speak out what I want to write and then I use a language model to polish it later.

> You mentioned that you usually talk out loud while working, but do you keep working as usual after you save the transcription for future use?

Yes, I continue working as "normal". But you see, there's this slight concern that if I keep talking all day, every day, someone might reserve a spot for me in a mental asylum.

> Do you ever stop your work to do something with the transcription right away?

Sometimes, yes. If I'm writing a specific message, I might pause my work to polish it immediately. But if I'm just voicing my random thoughts, I would like to access them later to write my messages or posts.