| Nice! I really like how many variations on this idea are coming out. MacWhisper used to be great, but is kinda of a buggy mess now. I'm making my own, for personal use. I did a survey of many and they all (that I could find) skip the fundamentals. The major issues that I've run into: - Crash recovery. Most of these apps are incredibly buggy and crash all the time, taking the recorded audio with them. Macwhisper is incredibly bad at this. - Disk space. Many of these apps save wav files to disk. After a few hours of meetings, you may end up with gigabytes eaten. - Microphone bleed. People don't always use headphones, the system mic will pick up the speaker sounds, causing duplicate (approximately) transcriptions. I've yet to find a solution that handles all these correctly, let alone having high quality transcriptions. Anyway, most of these apps are built around https://github.com/FluidInference/FluidAudio, if anyone is curious. Their readme has a big list of similar apps as well. |
I think I've got the other two bits covered. I pushed an update yesterday that adds active echo cancellation so that audio playing through the speakers (or leaky headphones) won't get transcribed twice if it is picked up by the microphone. It can be disabled in preferences, but it's on by default.
The disk space issue is one that I considered as well. By default, Trace deletes the actual audio recordings as soon as transcription is successfully completed, so the idea is you keep just the markdown transcript rather than the gigabytes of raw audio. If you want, there's a preference to disable the auto-deletion. There's a bit more on the support page here https://traceapp.info/support (search for "Auto-deletion of audio").
FluidAudio is a big part of this and is actually used in two places during a session. It runs the Parakeet EOU model for the instant recap (which isn't hugely accurate, but it's good enough for the job) and after the call it's also used to transcribe the recording, depending on which engine you've selected (Trace offers a fast and an accurate one). If the fast engine is selected, we use FluidAudio with the Parakeet-TDT 0.6b v3 model for transcription, which then goes through Pyannote and WeSpeaker for diarization. If the accurate engine is selected, we use WhisperKit with the Whisper large-v3-turbo model for transcription, and SpeakerKit for diarization.