Hacker News new | ask | show | jobs
by boa00 2 days ago
Not sure I agree here

Text is just human thoughts in their most simple form. Writing is about expressing ideas, and there is almost an infinite number of ways to express them. Extremely difficult task, and LLMs only "imitate" it to the best of their training

This is not at all true for voice. There are an infinite number of possible voices, but a finite number of tones and phonemes you can use to express the text.

It's a much easier technical problem; it's just that it's much harder to gather proper data (you cannot just scrape Reddit and hope for the best, as LLMs do). And voice gets like 1/100th of LLMs' funding

1 comments

Ironically, one of the thing that makes written word by AI recognizable as AI is that it's too perfect. Too polished. Now think about speech patterns, they are way more than voice frequencies, tones and phonemes. One can say the same phrase gazillion different ways, with different pauses, cadence, inflections, intonations and even pitch. Humans speak "imperfectly." It's very contextual too: in many situations, we voice the same words very differently. Again, it's possible that I don't know what I'm talking about, but every example of machine talking that I've heard, I felt it was too mechanical, precisely because it was lacking the nuance of how real humans speak.