|
|
|
|
|
by mitchsayre
45 days ago
|
|
Do you think Pollen is applicable to distributed AI inference? I think it could work for realtlime Voice Agents running directly on mobile hardware. There are speech-to-speech LLMs that are big and do pure audio in audio out. But you can also make voice agents that use multiple smaller models cascadded. ASR for transcription, LLM for response text, TTS for speech, interrupt detection. If you try to load ASR, LLM, and TTS models that actually do a good job onto the same mobile device all at once, you can't get it to be realtime. But if you run them in a distributed setup, where each device has only one model loaded and streams its output to the next task device, you might achieve realtime performance while using stronger models for each task. Does this sound possible, or am I misunderstanding how Pollen works? |
|
The wonder if the limiting factor would be _which_ models can actually be compiled into a reasonably sized WASM module (I'm not familiar with this right now--are you aware of efforts in this space?). If there are genuinely effective WASM models that fit into a reasonable sized modules, then it would fit nicely.
All this with the previously acknowledged limitation that it's not yet on mobile (but perhaps a number of edge Pollen nodes could act as ingresses into the cluster in the interim).
I'm super interested to hear how you might employ it though, if you did start experimenting. I'd be interested to learn where it's useful and where it falls short. Please feel free to hit me up on Github or by email (in my profile)!