Interesting that your project approximates Thinking Machines' Interaction Models on a CPU-only setup. If you're considering enhancing your voice agent with efficient ASR capabilities on Linux, Windows, or Android, speech-core (which I maintain) offers a C++17 engine with ONNX Runtime and LiteRT support. It could complement your setup well. https://github.com/soniqo/speech-core