|
|
|
|
|
by reissbaker
595 days ago
|
|
Eh, that depends. A small model that's voice-and-text is probably more useful to most people than scaling up a voice-only model: the large voice-only model will have to compete on intelligence with e.g. Qwen and Llama, since it can't be used in conjunction with them; whereas a small voice+text model can be used as a cheap frontend hiding a larger, smarter, but more expensive text-only model behind it. This is an 8b model: running it is nearly free, it can fit on a 4090 with room to spare. On the one hand, a small team focused on voice-to-voice could probably do a lot better at voice-to-voice than a small team focused on voice-to-voice+text. But a small team focused on making the most useful model would probably do better at that goal by focusing on voice+text rather than voice-only. |
|
At the end of the day, the released product needs to be good and needs to be done in a reasonable amount of time. I highly doubt they can do a generic model as well as a more specialised one.
But if you think you know better than them, you could try to contact them even though it looks they are crazy laser focused (their public email addresses are either for investors or employee candidates).