Hacker News new | ask | show | jobs
by pico_creator 538 days ago
This is actually the hypothesis for cartesia (state space team), and hence their deep focus on voice model specifically. Taking full advantage of recurrent models constant time compute, for low latencies.

RWKV team's focus is still however is first in the multi-lingual text space, then multi-modal space in the future.

1 comments

Karan from Cartesia explains SSMs+voice really well: https://www.youtube.com/watch?v=U9DPRZ0lSIQ

its one of those retrospectively obvious/genius insights that i wish i understood when i first met him