|
|
|
|
|
by srimalireddi
11 days ago
|
|
You asked the right question that's blocking many people from productionizing this kind of solution on their website. If we break down the anatomy of the Voice Agent, it looks like this STT -> Ambient Retrieval(Moss) -> LLM [+ Tool calls -> On-Demand Retrieval(Moss)] -> TTS Now STT, TTS and LLM output generation are fixed cost and independent of data scales. In reality, a typical landing page and public-facing website content will range from 100's of docs (for startups) to 100K's of docs (for enterprises). Moss's retrieval stack runs sub-10 ms with the following internal benchmarks - - P99 of ~5.4 ms for 100K docs in a shared container - P99 of ~4 ms for 1M docs in a dedicated VM our R&D team is cranking it to 200M+ docs with sub-10ms promise but sky is the limit for our scale. |
|