Hacker News new | ask | show | jobs
by torginus 235 days ago
It can?

If you use 'multimodal transformer' instead of LLM (which most SOTA models are), I don't think there's any reason why a transformer arch couldn't be trained to drive a car, in fact I'm sure that's what Tesla and co. are using in their cars right now.

I'm sure self-driving will become good enough to be commercially viable in the next couple years (with some limitations), that doesn't mean it's AGI.

2 comments

There is a vast gulf between "GPT-5 can drive a car" and "a neural network using the transformer architecture can be trained to drive a car". And I see no proof whatsoever that we can, today, train a single model that can both write a play and drive a car. Even less so one that could do both at the same time, as a generally intelligent being should be able to.

If someone wants to claim that, say, GPT-5 is AGI, then it is on them to connect GPT-5 to a car control system and inputs and show that it can drive a car decently well. After all, it has consumed all of the literature on driving and physics ever produced, plus untold numbers of hours of video of people driving.

>There is a vast gulf between "GPT-5 can drive a car" and "a neural network using the transformer architecture can be trained to drive a car".

The only difference between the two is training data the former lacks that the latter does so not a 'vast gulf'.

>And I see no proof whatsoever that we can, today, train a single model that can both write a play and drive a car.

You are not making a lot of sense here. You can have a model that does both. It's not some herculean task. it's literally just additional data in the training run. There are vision-language-action models tested on public roads.

https://wayve.ai/thinking/lingo-2-driving-with-language/

> single model that can both write a play and drive a car.

It would be a really silly thing to do, and probably there are engineering subletities as to why this would be a bad idea, but I don't see why you couldn't train a single model to do both.

It's not silly, it is in fact a clear necessity to have both of these for something to be even close to AGI. And you additionally need it trained on many other tasks - if you believe that each task requires additional parameters and additional training data, then it becomes very clear that we are nowhere near to a general intelligence system; and it should also be pretty clear that this will not scale to 100 tasks with anything similar to the current hardware and training algorithms.
Okay but then can a multimodal transformer do everything an LLM can?
Most SOTA LLMs are multimodal transformers.