Y
Hacker News
new
|
ask
|
show
|
jobs
by
donpark
104 days ago
But I've read somewhere that KV cache for speech-to-speech model explodes in size with each turn which could make on-device full-duplex S2S unusable except for quick chats.
1 comments
tmzt
104 days ago
Gemini Nano is supposedly doing it on device. It looks like something similar should work with Apple GPU and ANE.
link