| You can test Gemini 3.1 Lite transcription capabilities in https://ottex.ai — the only dictation app supporting Gemini models with native audio input. We benchmarked it for real-life voice-to-text use cases: <10s 10-30s 30s-1m 1-2m 2-3m
Flash 2548 2732 3177 4583 5961
Flash Lite 1390 1468 1772 2362 3499
Faster by 1.83x 1.86x 1.79x 1.94x 1.70x
(latency in ms, median over 5 runs per sample, non-streaming)
Key takeaways:- 1.8x faster than Gemini 3 Flash on average - ~1.4 sec transcription time for short to medium recordings - ~$0.50/mo for heavy users (10h+ transcription) - Close to SOTA audio understanding and formatting instruction following - Multilingual: one model, 100+ languages Gemini is slowly making $15/month voice apps obsolete. |
That much is easy but what if you could also speak to and interrupt the main voice model and keep giving it instructions? Like speaking to customer support but instead of putting you on hold you can ask them several questions and get some live updates