Hacker News new | ask | show | jobs
by chrisco255 1152 days ago
We have access to video training data for driving, absolutely tons of it, as we've been attempting to train AI cars for more than two decades now. If GPT4 is what you say it is, we should be able to train it on that video data and solve autonomous driving. There is nothing inherently about transformers that prevents them from taking in video data. They've already been used by some researchers (https://arxiv.org/abs/2104.09224).

And yet, you can take a 16 year old who's never driven, and teach them within a week to be decent at it and maybe 50-100 hours of driving training and they're competent. You don't need to show them a billion man-hours of driving footage. Even the first people to buy cars in the late 1800s when they were first invented, were able to pick it up almost right away (there weren't even driving licenses back then).

At any rate, driving is just one example. Despite being one of the oldest futuristic sci-fi examples, I don't see restaurants powered by AI. I don't see housekeeping powered by it.

Ok, those are embodied examples, so you say they're unfair. Fine. What remote-friendly jobs are being swept away by AI? Can we even do customer service with AI right now? No, outside of some "front line" chat bot (which just replaces phone trees and terrible localized search engines), we can't. Even if a GPT is trained on a business's proprietary documentation, it's wrong or unresponsive enough that it would cost you more than it would save you by firing your customer support staff.

1 comments

> a 16 year old

A 16 year old has a decade of vision input, knows tens of thousands of words, has a coherent theory-of-mind, and has self-preservation instincts honed over hundreds of millions of years of evolution.

The equivalent scenario would be take a fully general AGI(!) that already has a body and has learned to manipulate the physical world, put that in the car and have it learn to drive.

A lot of people seem confused about these scenarios. The currently popular LLMs are like a child raised in a black box, and are weirdly retarded in the same way you would expect a child raised in a black box to be.

Similarly, driving AIs don't speak English, can't take instructions, and are simultaneously learning physics, theory-of-mind, the rules of the rode, and signage conventions without having agency during most of their training.

> A 16 year old has a decade of vision input, knows tens of thousands of words, has a coherent theory-of-mind, and has self-preservation instincts honed over hundreds of millions of years of evolution.

No they don't. Where were you driving 23 days ago? Or, if you could categorize your driving data in your head, what did you pass, precisely, in the car while you were driving 96 hours ago? AI training data has all of this. It has perfect timestamps with 25-30 frames per second (or more in some cases, with multiple cameras feeding frames in to the dataset) with full 360 view in many cases, adding up to millions of man-hours of driving data from thousands of drivers, which is longer than any human lifespan. A lot of times this data is supplemented with IR representations or LIDAR representations of the environment as well, something a human can't even see.

> Similarly, driving AIs don't speak English, can't take instructions, and are simultaneously learning physics, theory-of-mind, the rules of the rode, and signage conventions without having agency during most of their training.

They're not learning that on their own. They're being poked and prodded by humans in the loop (and indeed, automated tagging tools) who have tweaked the models by adding numerical weights to "bad outcomes" and "good outcomes". Or "good categorizations" and "bad categorizations", or "aligned responses" and "unaligned responses". And because it's just a dumb statistical model, if you tell it the color blue is bad and that it should resist answering any questions related to the color blue, it will agree, because its entire heuristic model for organizing the world was designed by humans.

Similarly, it doesn't have a theory of mind. If it did, it would also have agency and it would already be AGI. Instead, it has access to examples of data of humans exhibiting theory of mind with other humans. And it's got enough parameters in its statistical model that it can accommodate a weighting for this "theory of mind" impact on what the expected output should be.

LLMs are nothing like an intelligent child raised in a black box. I mean, we already have examples of this, vis-a-vis being both blind and deaf: https://en.wikipedia.org/wiki/Helen_Keller.

> No they don't. Where were you driving 23 days ago? Or, if you could categorize your driving data in your head, what did you pass, precisely.

Just because I cannot do that, still my brain had this input, and was "trained", and we know the real neurons of the brain work completely different in all regards (connecitvities forming, activations, weighting) than our gross simplification of one activation function with a weight.. so not getting your arguments at all.

I have been trained on all these inputs, and even if I cannot recall them now they manifested in a my superior brain somehow.

Nice that in theory the AI training data has all this available, still the resulting model is much simpler, and also cannot recite all of its inputs seen, too??

I mean everyday vision input, which generalises to novel scenarios because driving happens in the same world with the same rules of physics and optics. Shadow and light, depth and movement perception, etc… can all be reused.

This is like how LLMs are difficult to train to the point that they understand English, but then relatively easy to specialise.

Similarly, students don’t start their education at University, they spend nearly two decades getting to that point by learning the prerequisites.

> Similarly, it doesn't have a theory of mind. If it did, it would also have agency

These are unrelated concepts. There is evidence that GPT 4 has a rudimentary model of mind but it isn’t conscious itself and is a static model with zero agency over the world.