Hacker News new | ask | show | jobs
by brucethemoose2 994 days ago
1 hour of video is a lot of data.

The bigger question is how well is it annotated? And how "dense" is the driving action?

5 comments

1 hour of video is a lot of bytes. But 1 hour of driving doesn’t contain much of the total experiential set possible when driving. Generative AI only really can interpolate between things it has been trained on, it can’t extrapolate. This feels like an infinitesimal amount of driving data.

That said, a POC doesn’t have to be production worthy. In the hands of a major automobile company perhaps this tech is profoundly powerful. Or perhaps it’s a single model in an ensemble. Regardless it’s going to be interesting where this goes.

Generative AI can absolutely extrapolate, that's the whole reason it works.

The whole point of machine learning is to derive the underlying rules relating your input data. Extrapolation is just extending where you follow those "curves" beyond bounds of known data points.

Oh but it can’t, but with a sufficiently complex vector space it can seem like it can. What seems like extrapolation is an interpolation in the semantic vector space, particularly in the transformer / attention model. This is a key difference between human intelligence and current AI, it’s not able to “create” and see beyond what it’s been trained on. Any approximation of that is simply indicative of a very complete training set, and it is sufficiently powerful enough to fool people with its expectation based inference - but when you dig into the details of cutting edge stuff you’re an expert in and ask it conceptual questions that extend beyond the semantic corpus embedded in its vector space, it will hallucinate, or if well fine tuned admit lack of knowledge, because the best it can do is interpolate within its own semantic vector space.

But listen, I’m a big buyer of generative AI, what it does is incredible. But it’s useful to not ascribe more power to a tool than the math allows.

And there are very few machine learning algorithms that do extrapolation at all with any precision. Generally they project an expectation,often of some complex highly dimensional non linear system, which is amazing, but when they are confronted with a novel input pattern they are thrown off. The issue is they’re at their core probabilistic systems, and if the data experiences a regime change that’s unexpected the model will misbehave and output garbage.

You say it doesn't extrapolate then fashion your own nonsense definition and "explanations" for clear instances where it does. Lol Okay.
Enlighten us then how a generative AI model behaves when confronted with data outside its training space? Where in the model does it allow for the vector space to extend dynamically based on some other process to adapt to new regimes never seen before? Or does it necessarily construct its response by sampling the vector space, and in the case of transformers, apply attention / self attention to boost / dampen dimensions based on the semantic context? Extrapolation means being able to extend your decision space into new areas through synthesis and creativity, interpolation means walking within the trained vector space of the model. Clearly generative AI models as implemented today can’t extrapolate and always interpolate.

I think confusion comes from the idea that you can take a regression or expectation and extend it into the future and is that extrapolation. It isn’t - it’s interpolation still. You’re interpolating between a and a’ using the same function. Extrapolation takes the new regime and data and your existing training and adapts a new behavior. We don’t really understand how humans do this, and we don’t have any machine learning models that can.

To be clear, again, I’m not poopooing ML or generative AI. I think it’s the most powerful thing we’ve created with computers so far. But it’s far from general intelligence, even if it’s a necessary part.

>Enlighten us then how a generative AI model behaves when confronted with data outside its training space?

It behaves just fine.

>I think confusion comes from the idea that you can take a regression or expectation and extend it into the future and is that extrapolation.

Congratulations, you've just defined extrapolation. Someone is definitely confused here but it isn't me.

Of course you can make any claim about what something can or can't do when you make up your definitions.

There are many many clear examples of a language model extrapolating. Rather than accept this, you've opted to conjuring up vague and meaningless definitions and distinctions on the fly.

This is so simple to see. Untestable Definitions are meaningless. Please give us a test of "extrapolation" that all humans can perform and let's see how the Language Model does. You won't be able to but by all means, give it a go.

>Extrapolation takes the new regime and data and your existing training and adapts a new behavior.

Great and Language Models do this.

> I think confusion comes from the idea that you can take a regression or expectation and extend it into the future and is that extrapolation.

That's the literal definition of extrapolation, so I think the confusion is coming from your side.

By this metric, a large number of humans can’t extrapolate either. In fact if you imagine your first paragraph were written about humans, it lines up pretty well.
Except all humans can extrapolate even if they don’t. Current generative models fundamentally can not, even you want them to.

However I would hold that I can prove you’re wrong. Have you ever seen a human play make believe when they’re young? Draw? You’re judging humanity by post indoctrination crushing of the soul for profit. But every human being, no matter how rigid and unthinking as an adult, was a creative genius at age four.

Don't be a dick, you know what is meant.
I wouldn’t go this far, but I would say the “a lot of humans can’t either” argument in LLM convos is a bit worn now. Where it’s true (hallucinating on the edges of certain knowledge, solving math and logical reasoning through approximation and most likely thinking) and where it’s not, it’s all been said already many times.

The key though is that in these things “most humans” isn’t a very useful comment when the discussion is “all AI.” The comment, even if true, acknowledges there exists some humans that do, doesn’t refuse that all AI don’t, so doesn’t advance much of the discussion. In a parallel comment I pointed out that all humans can even if they don’t appear to, then further assert all humans have even if they don’t appear to currently or consistently, so they exist as distinct classes in this space of thinking and reasoning from generative AI.

Huh? I disagree with the premise, and explained why.

Reading over your comments for the last few days, you seem consistently aggressive. If you need to vent to someone about something, you can DM me. Happy to just listen.

It can to a point, but it can’t to the extent that humans can with sufficiently complex problems.

Deep learning models like this can theoretically approximate pretty much any problem that can be expressed as a function.

It’s entirely possible that there just doesn’t exist a function from visual data (maybe even including LIDAR and RADAR etc) to correct driver decisions.

Humans can also intuit the behavior of other humans to an extent, even while driving (knowing that someone who is driving erratically is probably fucked up and will be dangerous to stay near). Kind of like a really shitty gossip protocol.

It can only approximate any function for which it’s seen data in the local feature spaces of the function. For anything it’s not seen features for it will do some maladapted interpolation through the feature space it has been trained on. It can’t be creative or synthesize a novel technique based on some more abstract reasoning over the new regime - it literally must attempt to fit its past observations as best it can to the new regime. Humans certainly do that too, but they are also able to step back and synthesize completely new behaviors given completely new data that isn’t just adapting old behavior based on some optimization function telling it that behavior is most appropriate in the new situation.

People are confused because interpolation is actually fairly powerful and is often entirely sufficient. Especially with the GPT4 model it’s so well trained with such a large and varied corpus that it is able to handle many things well, even unexpected things, and seems like it is extrapolating at times. But it still hallucinates, and these are the most obvious symptoms of its inability to extrapolate. It’s just fitting within its trained vector space as best it can.

But ... all this goes for humans too! Is the argument that we should just outlaw driving alltogether, all possible forms?

One famous example of that is how to react correctly when the car starts to slip due to speed, braking, or driving on water, mud, sand, snow or ice. I think everyone knows people's reflexes are to floor the brakes, and start wildly turning the steering wheel, which only results in total loss of control over the vehicle. Is anyone demanding drivers learn to correctly handle cars or other vehicles under these circumnstances? There are only very minimal efforts, because it is completely impractical to teach many humans better driving practices. So we just accept the flaws ... and the constant stream of victims this generates.

Reality: Yes, a driving AI is not ready for all possible situations. It just isn't. It will never be. Is that a problem?

Also reality: Humans drive drunk. Humans drive while under the influence of drugs. Humans drive trucks near kids when they're so tired they can't keep their head lifted up. And roads are full of dead cats, squirrels, mice, ...

Also also reality: AI driving software can, after an accident, be taught to handle the situation that caused the accident, and the result of this learning process can then be uploaded to all instances of this software. Humans will keep making the same mistakes, with the same consequences, over and over and over again. Perhaps there is very slow improvement (mostly by modifying roads), but it takes decades.

Practical view: I have driven around in Mountain View next to self-driving cars. One thing's for sure: self-driving cars behave much better than human drivers. Including me. It's very irritating how good they behave on the road. If the roads have many self-driving cars, I'm pretty damn sure it'll result in much fewer accidents and lower transit times. Never mind that self-driving cars of course solve the parking problem. I don't get why people hate them.

And I hate this goalpost moving where AIs are compared to multiple top-performing humans, that you see everywhere. Of course, there are now cases where AIs have actually beaten groups of top-performing humans (translation, chess, Go, robot control, ...)

Your arguments don't really fit, what was previously said.

There was nothing said against driving AI in general, just that 4700h of videos seems low.

I also get that humans are pretty bad drivers, but isn't that exactly why we shouldn't use them as the baseline for AIs to compare to?

We are now at a point where we can set high standards for AI, so we get a best possible result, because while it isn't feasible to have everyone learn driving over a couple of years, a good AI has to be trained once and can be used by many, so we have the time.

And sure, it can be updated, but should we really trust companies to keep innovating once they are already allowed to have the AI in use? The incentive to do so is far bigger if they have to do so before they got any money out of it.

Interestingly there’s another thread I’m in about generative AI where someone asserts “this goes for humans too” in a sort of similar vein.

However it’s not the case. Humans have the ability to extrapolate from their training data and synthesize new thought and behavior from situations not seen before that’s fundamentally insightful and adaptive. Generative AI and all machine learning I’m aware of are fundamentally expectation driven probabilistic models that synthesize highly dimensional non linear functions from samples of those functions, which means they can’t adapt to new situations dynamically and extrapolate their experiences into new experiences and make decisions that are novel and intuitive given a new regime.

When these models encounter new situations they’ve never experienced or new regimes they interpolate in their learning space to a most likely behavior, but the learning space has nothing similar in it, so the behavior can become seemingly random and highly maladapted.

A classical example of this is obviously LLM hallucinations at the edge of their knowledge which an expert in a field can induce by asking questions beyond the horizon of the field. While humans might not have answers, they can pose interesting theories, while LLMs really can’t - if they appear to it’s simply because their training set is so massive they can interpolate into babbling that sounds good. You could assert humans do this to, and it’s true they do at times, but they also don’t at other times and have novel insights beyond their experience. The fact they can do this sometimes and AI mathematically due to its internal structure can’t at any time is the difference.

Another example is Go playing AI. They can do really well against expert players until someone plays a nonsense series of plays that are random and the AI begin to play worse than amateurs. You can do this with LLMs too, if you give them enough random nonsense or repeated strings, they just leap into some random spot in their vector space and rant about weirdness. Even GPT4 does this.

The answer isn’t to outlaw driving or to stop pursuing AI driving assistants. It’s to build models with an enormous well labeled corpus that covers almost every possible situation, but also build in fail safes that make it easy for a human to be called to attention and take control when things are confusing the AI.

1 hour of driving isn't a lot of driving, or a lot of anything really. There are heaps of individual things that happen maybe once or twice in a lifetime that we're already somewhat equipped to deal with based on our other lived experiences - for example, seeing a ball bouncing towards the road and a child or dog chasing after it, I'm already starting to brake before they even approach the edge of the sidewalk. Or seeing a fast-moving wheel bouncing down the road - I know from watching dashcam footage on youtube that you'd want to keep WELL clear of a 30+kg obstacle with a whole bunch of angular momentum, because it will fuck up anything in its path. Good luck to an AI to figure out what's going on, or what the path of a single fast-rolling wheel is going to look like.
1) how many hours does a pilot need on a 'type' to fly unsupervised?

2) it seems kind of a meaningless unit here, because nobody said it was real world real-time driving hours? And even if it is, if they were logged with the intention of finding 'interesting' scenarios vs. just A-B motorway driving would make an enormous difference.

> how many hours does a pilot need on a 'type' to fly unsupervised?

Excellent question. And the answer is between 45 hours and a few hundred million years. Hear me out.

45 hours is because that is the minimum amount of flying you need to get a private pilot licence: https://www.takeflightaviation.co.uk/ppl-guide.html

(Ignoring here ultra lights and paragliders where you sometimes don't need a licence in some jurisdictions.)

So that is the straightforward answer maybe you are looking for.

Then we might say that all pilots are required to be at least 17 years old to obtain a private pilot licence. So that is 148920 hours of pre-training in preceiving objects and movement, and coordinating one's actions with perception.

Then one might also say that one requires to be a human, and that comes with hundreds of millions of years of pre-training where all of our ancestors were evolutionary selected to be good at perceiving and moving. (At least good enough to survive until they could propagate their genes.)

Now this answer might come of as flippant, and maybe it is. What I'm trying to say is that it is hard to compare "training hours" directly between computers and humans. And it is hard because of these two things which humans have "pretraining by lived experience" and "pretraining by evolution".

Don't most of the competitors have like hundreds of thousands to millions of hours of driving?
With lidar point clouds corresponding to every frame...
...Yeah, that is more like it.
This method is unsupervised and does not train on annotated data. They use a similar approach to GPT: autoregressive sequence modelling.
This looks like self-supervised training to generate new videos, so it's unlikely to be annotated/labeled.