|
|
|
|
|
by pstroqaty
299 days ago
|
|
If anyone's interested in a janky-but-works-great dictation setup on Linux, here's mine: On key press, start recording microphone to /tmp/dictate.mp3: # Save up to 10 mins. Minimize buffering. Save pid
ffmpeg -f pulse -i default -ar 16000 -ac 1 -t 600 -y -c:a libmp3lame -q:a 2 -flush_packets 1 -avioflags direct -loglevel quiet /tmp/dictate.mp3 &
echo $! > /tmp/dictate.pid
On key release, stop recording, transcribe with whisper.cpp, trim whitespace and print to stdout: # Stop recording
kill $(cat /tmp/dictate.pid)
# Transcribe
whisper-cli --language en --model $HOME/.local/share/whisper/ggml-large-v3-turbo-q8_0.bin --no-prints --no-timestamps /tmp/dictate.mp3 | tr -d '\n' | sed 's/^[[:space:]]*//;s/[[:space:]]*$//'
I keep these in a dictate.sh script and bind to press/release on a single key. A programmable keyboard helps here. I use https://git.sr.ht/%7Egeb/dotool to turn the transcription into keystrokes. I've also tried ydotool and wtype, but they seem to swallow keystrokes. bindsym XF86Launch5 exec dictate.sh start
bindsym --release XF86Launch5 exec echo "type $(dictate.sh stop)" | dotoolc
This gives a very functional push-to-talk setup.I'm very impressed with https://github.com/ggml-org/whisper.cpp. Transcription quality with large-v3-turbo-q8_0 is excellent IMO and a Vulkan build is very fast on my 6600XT. It takes about 1s for an average sentence to appear after I release the hotkey. I'm keeping an eye on the NVidia models, hopefully they work on ggml soon too. E.g. https://github.com/ggml-org/whisper.cpp/issues/3118. |
|
> It is difficult to get a man to understand something, when his salary depends upon his not understanding it!
in the special case where the thing to be understood is "your app doesn't need to be a Big Fucking Deal". Maybe it pleases some users to wrap this in layers of additional abstraction and chrome and clicky buttons and storefronts, but in the end the functionality is already there with a couple of FOSS projects glued together in a bash script.
I used to think the likes of Suckless were brutalist zealots, but more and more I think they (and the Unix patriarchs) were right and the path to enlightenment is expressed in plain text.