|
|
|
Show HN: Slave – local dictation and TTS for macOS (3k words free)
(slave.bot)
|
|
2 points
by mesadb
205 days ago
|
|
Slave is a macOS app for voice-in, voice-out. Dictate in most languages. Types into any app. Listen back with local Piper TTS. 3,000 words free. Then $6.99/month. Next: joins meetings, transcribes, writes short notes.
Later: lightweight Obsidian-style notes built from your text. Built on Whisper + Piper. Runs on your machine. Feedback on UX, speed, and pricing is welcome. |
|
My goal was “press hotkey, start talking, see text within ~1–2 seconds” on an M2 MacBook Pro, and support multiple languages.
First attempts (cloud) – I tried Hugging Face real-time transcription. It worked but latency was all over the place and costs would not scale. – I tried OpenAI real-time transcription. Latency was better, but when there was background noise, it'd transcribe wrong things. Saw 200ms responses. I can bring that back if I can make it stable. – I briefly experimented with Gemini for transcribing and formatting multi-language text. Quality was not consistent enough compared to Whisper for Multi language.
Local experiments – I used FFmpeg + Whisper CLI in a bunch of ways: batching, buffering, trying to “stream” partial results out of Whisper to make it feel live. – I also tried a local Llama model to format the raw transcript into an email. On an M2 Pro this took ~2 seconds for short emails and got much slower for long text. It looked nice but the latency was not acceptable for everyday use.
Where I ended up (for now) – Current version sticks to FFmpeg + Whisper CLI locally, optimized for short chunks so you usually see text within about 1–2 seconds. – I dropped the heavy on-device LLM formatting and keep the formatting logic much simpler so it stays predictable and fast.
Next step is to re-introduce “smart” formatting and meeting notes, but only when I can do it without blowing up latency. Happy to dig deeper into any of these if people are curious.