Hacker News new | ask | show | jobs
by reissbaker 595 days ago
Eh, that depends. A small model that's voice-and-text is probably more useful to most people than scaling up a voice-only model: the large voice-only model will have to compete on intelligence with e.g. Qwen and Llama, since it can't be used in conjunction with them; whereas a small voice+text model can be used as a cheap frontend hiding a larger, smarter, but more expensive text-only model behind it. This is an 8b model: running it is nearly free, it can fit on a 4090 with room to spare.

On the one hand, a small team focused on voice-to-voice could probably do a lot better at voice-to-voice than a small team focused on voice-to-voice+text. But a small team focused on making the most useful model would probably do better at that goal by focusing on voice+text rather than voice-only.

1 comments

Their goal is not working on what's most useful for most people though. That's the domain of the big AI players. They are small and so specialising works best as that's where they can have an edge as a company.

At the end of the day, the released product needs to be good and needs to be done in a reasonable amount of time. I highly doubt they can do a generic model as well as a more specialised one.

But if you think you know better than them, you could try to contact them even though it looks they are crazy laser focused (their public email addresses are either for investors or employee candidates).