Hacker News new | ask | show | jobs
by kypro 1074 days ago
This is quite off topic, but it reminded me of something I have been thinking about recently – perhaps at the limit all highly capable narrow AI systems must become generally intelligent.

I was thinking about the complexity of expression in TTS voice synthesizers recently and it struck me just how difficult a problem that is.

To be as expressive as a human the AI model would need to fully "understand" the context of what is being said. Consider how a phrase like "I hate you" can be said in a loving way between friends sharing a joke at each others expense, vs being said with anger or in sadness.

It got me wondering if all sufficiently complex problems require models to be generally intelligent – at least in the sense that they have deep, nuanced models of the world.

For example, perhaps for a self-driving car to be as "good" as a human it actually needs to generally intelligent in that it needs to understand that it's appropriate to drive differently if it is in an emergency situation vs a leisurely weekend drive through a scenic part of town. When driving through my city after 8PM on the weekend I tend to drive slower and more cautiously because I know drunk people often walk out in front for my car – would a good self-driving car not need to understand these nuances of the world too?

This is interesting because it highlights just how important the element human understanding is in to accurately convey expression in a voice synthesizer. While I'd argue modern voice synthesizers have been more intelligible than this for some time the expressiveness of this machine has probably only been recently been rivalled by state of the art AI models.

3 comments

Probably to some degree, but for your two examples I would argue that isn't necessary:

For TTS, the "tone" is something you should encode in the input rather than have TTS figure out. I can imagine ebook > LLM > annotated text with speakers, emotions etc > TTS. So the TTS can remain rather dumb.

For the self-driving car, it shouldn't know cultural norms and be "more careful" sometimes. It should always know how much it sees and what stoping distance it can get with max breaking and its reaction time and adjust accordingly.

Agreed on stuff like emergencies etc.

> For the self-driving car, it shouldn't know cultural norms and be "more careful" sometimes. It should always know how much it sees and what stoping distance it can get with max breaking and its reaction time and adjust accordingly.

I used to live next to two schools. In the morning before school the pavement and road outside my house was always full of school kids on bikes. During this time I'd drive with the assumption that at any moment a bike could drive out in front of my car because those kids were nuts and often did.

But to assume this generally just to be safe would be extremely inconvenient. In reality if I see a group of bikers wearing lycra I will assume their competent bikers. While I'll still drive carefully, I won't assume they're about to pull out in front of my car.

If self driving cars operate with the assumption that every pedestrian is drunk and every bike on the road is a 12 year school boy then no one will use them. Do self driving cars try to this currently? If I jaywalked in front of a Tesla is it designed to always be able to stop in time?

I'd expect self driving cars to have much better sensors and reaction times than we do, and as a consequence not needing to choose between those risks and actually carrying people from one point to another.

But they will probably be way slower than people on streets that are just at the side of sidewalks and full of pedestrians.

> I'd expect self driving cars to have much better sensors and reaction times than we do

That is never going to happen.

Words like "good", "better" and "should" always carry freight that's often worth unpacking. Here, "better" really needs a definition.

A CCD is better than human eyes inasmuch as it captures a field rather than a narrow focus, with fuzzy periphery, that must be pointed at an object to resolve it.

I'm sure we could find metrics where a 360-degree lidar is better than human eyes.

It's disingenuous to pretend that sensor quality is the whole story, of course.

Human drivers have notoriously variable reflexes. I once rear-ended someone because I was inattentive. I assert that the current gen has better reflexes than some percentile of real-world meat-drivers, and I suspect that the percentile is higher than 90. Human reflexes simply aren't that quick without significant priming.

Yes. In Iain M. Banks’s Culture, even the guns are generally intelligent.
I think our current gen AI is only 1 piece of the puzzle.

This gen understands how to put words together to satisfy its internal requirement to please the instruction it is given, but it has no volition of its own and no drive it arrived at of its own cognition.

I believe GAI will need to have multiple current gen systems running simultaneously, (in unison if not in harmony) simply to form a subconscious layer that a truly next gen AI would then pick and choose from.