Hacker News new | ask | show | jobs
by solatic 43 days ago
Why does the voice need to be sent to the server? Why not perform speech-to-text on-device? Is the p10 phone/laptop not capable of this yet, despite every "dictation" feature I see in every modern OS?
1 comments

An eventual goal is likely to allow interacting with the LLM directly via audio tokens in input/output skipping tts and stt completely.