| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kushalc 289 days ago

Hey folks, OOP/original author and 20-year HN lurker here — a friend just told me about this and thought I'd chime in.

Reading through the comments, I think there's one key point that might be getting lost: this isn't really about whether scaling is "dead" (it's not), but rather how we continue to scale for language models at the current LM frontier — 4-8h METR tasks.

Someone commented below about verifiable rewards and IMO that's exactly it: if you can find a way to produce verifiable rewards about a target world, you can essentially produce unlimited amounts of data and (likely) scale past the current bottleneck. Then the question becomes, working backwards from the set of interesting 4-8h METR tasks, what worlds can we make verifiable rewards for and how do we scalably make them? [1]

Which is to say, it's not about more data in general, it's about the specific kind of data (or architecture) we need to break a specific bottleneck. For instance, real-world data is indeed verifiable and will be amazing for robotics, etc. but that frontier is further behind: there are some cool labs building foundational robotics models, but they're maybe ~5 years behind LMs today.

[1] There's another path with better design, e.g. CLIP that improves both architecture and data, but let's leave that aside for now.

10 comments

FloorEgg 289 days ago

10+ years ago I expected we would get AI that would impact blue collar work long before AI that impacted white collar work. Not sure exactly where I got the impression, but I remember some "rising tide of AI" analogy and graphic that had artists and scientists positioned on the high ground.

Recently it doesn't seem to be playing out as such. The current best LLMs I find marvelously impressive (despite their flaws), and yet... where are all the awesome robots? Why can't I buy a robot that loads my dishwasher for me?

Last year this really started to bug me, and after digging into it with some friends I think we collectively realized something that may be a hint at the answer.

As far as we know, it took roughly 100M-1B years to evolve human level "embodiment" (evolve from single celled organisms to human), but it only took around ~100k-1M for humanity to evolve language, knowledge transfer and abstract reasoning.

So it makes me wonder, is embodiment (advanced robotics) 1000x harder than LLMs from an information processing perspective?

breuleux 289 days ago

> So it makes me wonder, is embodiment (advanced robotics) 1000x harder than LLMs from an information processing perspective?

Essentially, yes, but I would go further in saying that embodiment is harder than intelligence in and of itself.

I would argue that intelligence is a very simple and primitive mechanism compared to the evolved animal body, and the effectiveness of our own intelligence is circumstantial. We manage to dominate the world mainly by using brute force to simplify our environment and then maintaining and building systems on top of that simplified environment. If we didn't have the proper tools to selectively ablate our environment's complexity, the combinatorial explosion of factors would be too much to model and our intelligence would be of limited usefulness.

And that's what we see with LLMs: I think they model relatively faithfully what, say, separates humans from chimps, but it lacks the animal library of innate world understanding which is supposed to ground intellect and stop it from hallucinating nonsense. It's trained on human language, which is basically the shadows in Plato's cave. It's very good at tasks that operate in that shadow world, like writing emails, or programming, or writing trite stories, but most of our understanding of the world isn't encoded in language, except very very implicitly, which is not enough.

What trips us up here is that we find language-related tasks difficult, but that's likely because the ability evolved recently, not because they are intrinsically difficult (likewise, we find mental arithmetic difficult, but it not intrinsically so). As it turns out, language is simple. Programming is simple. I expect that logic and reasoning are also simple. The evolved animal primitives that actually interface with the real world, on the other hand, appear to be much more complicated (but time will tell).

FloorEgg 289 days ago

Nicely said. This all aligns with my intuition, with one caveat.

I think you and I are using different definitions of intelligence. I'm bought into Karl Friston's free energy principle and think it's intelligence all the way down. There is no separating embodiment and intelligence.

The LLM distinction is intelligence via symbols as opposed to embodied intelligence, which is why I really like your shadow world analogy. Without getting caught up in subtle differences in our ontologies, I agree wholeheartedly.

breuleux 289 days ago

You're right, we probably have different ontologies. To me an intelligent system is a system which aims to realize a goal through modelling its environment and planning actions to bring about that intended state. That's more or less what humans do and I think that's more in line with the colloquial understanding of it.

There are basically two approaches to defining intelligence, I think. You can either define it in terms of capability, in which case a system that has no intent and does not plan can be more intelligent than one that does, simply by virtue of being more effective. Or you can define it in terms of mechanism: something is intelligent if it operates in a specific way. But it may then turn out to be the case that some non-intelligent systems are more effective than some intelligent systems. Or you can do both and assume that there is some specific mechanism (human intelligence, conveniently) that is intrinsically better than the others, which is a mistake people commonly make and is the source of a lot of confusion.

I tend to go for the second approach because I think it's a more useful framing to talk about ourselves, but the first is also consistent. As long as we know what the other means.

FloorEgg 289 days ago

If intelligence is treated as a scale, should it be measured primarily by (a) the diversity of valid actions an entity can take combined with its ability to collect and process information about its environment and predict outcomes, or (b) only by its ability to collect and process information and predict outcomes?

In either case, the smallest unit of intelligence could be seen as a component of a two-field or particle interaction, where information is exchanged and an outcome is determined. Scaled up, these interactions generate emergent properties, and at each higher level of abstraction, new layers of intelligence appear that drive increasing complexity. Under such a view, a less intelligent system might still excel in a narrow domain, while a more intelligent system, effective across a broader range, might perform worse in that same narrow context.

Depending on the context of the conversation, I might go along with some cut-off on the scale, but I don't see why the scale isn't continuous. Maybe it has stacked s-curves though...

We just happen to exist at an interesting spot on the fractal that's currently the highest point we can see. So it makes sense we would start with our own intelligence as the idea of intelligence itself.

GeorgeTirebiter 289 days ago

I think it's an issue of hierarchies and the Society of Mind (Minsky). If a human touches a hot stove, or any animal's end effector, a lower-level process instantly pulls the hand/paw away from the heat. There are no doubt thousands of these 'smart body, no brain' interactions that take over in certain situations, conscious thinking not required.

Ken Goldberg shows that getting robots to operate in the real world using methods that have been successful getting LLMs to do things we consider smart -- getting huge amounts of training data -- seems unlikely. The vastness between what little data a company like Physical Intelligence has vs what GPT-5 uses is shown here: https://drive.google.com/file/d/16DzKxYvRutTN7GBflRZj57WgsFN... 84 seconds

Ken advocates plenty of Good Old-Fashioned Engineering to help close this gap, and worries that demos like Optimus actually set the field back because expectations are set too high. Like the AI researchers who were shocked by LLMs' advances, it's possible something out of left field will close this training gap for robots. I think it'll be at least 5 more years before robots will be among us as useful in-house servants. We'll see if the LLM hype has spilled over too much into the humanoid robot domain soon enough.

pmontra 289 days ago

> But it may then turn out to be the case that some non-intelligent systems are more effective than some intelligent systems.

That is surely the case on limited scopes. For example the non neural net chess engines are better at chess than any human.

I think that neural networks compare with human intelligence in a fair way, because we should limit their training to the number of games that human professionals can reasonably play in their life. Alphago won't be much good after playing, let's say, 10 thousand games even starting from the corpus of existing human games.

coldtea 289 days ago

>There is n separating embodiment and intelligence.

And yet whetever IQ you have, it can't make you just play the violin without actually having embodied practice first.

thfuran 289 days ago

If you have sufficient motor control and dexterity, the amount of required practice should be approximately zero. Just calculate the required finger position and bow orientation, pressure, and velocity for optimal production of the desired sound and do that. That is not how humans perform physical tasks though.

commakozzi 284 days ago

> That is not how humans perform physical tasks though

is it not though? wouldn't it just be that our processing center isn't located completely in the skull as we typically think, but is extended to our spinal cord and nervous system? Something is being processed, you're just not conscious of the entire process. This is especially clear to me as a musician: as you're learning to play, you have to be absolutely aware of all of those processes until you can finally just let go and play!

djmips 289 days ago

You've captured a lot here with you shadow world summary. Very well done - I've been feeling this and now you've turned it into words and I'm pretty sure you're correct!

highfrequency 289 days ago

> We manage to dominate the world mainly by using brute force to simplify our environment and then maintaining and building systems on top of that simplified environment. If we didn't have the proper tools to selectively ablate our environment's complexity…

This is very interesting and I feel there is a lot to unpack here. Could you elaborate on this theory with a few more paragraphs (or books / blogs that elucidate this)? In what ways do we use brute force to simplify the environment, and are there not ways in which we use highly sophisticated leveraged methods to simplify our environment tools? What proper tools allow us to selectively ablate complexity? Why does our intelligence only operate on simplified forms?

Also, what would convince you that symbolic intelligence is actually “harder” than embodied intelligence? To me the natural test is how hard it is for each one to create the other. We know it took a few billion years to go from embodied intelligence (ie organisms that can undergo evolution, with enough diversity to survive nearly any conditions on Earth) to sophisticated symbolic intelligence. What if it turns out that within 100 years, symbolic intelligence (contained in LLM like systems) could produce the insights to eg create new synthetic life from scratch that was capable of undergoing self-sustained evolution in diverse and chaotic environments? Would this convince you that actually symbolic intelligence is the harder problem?

lucketone 289 days ago

Not OP, but several examples:

A. instead of building a house on random terrain with random materials, first we prefer to flatten the place, then we use standard materials (e.g. bricks), which were produced from simple source (e.g. large and relatively homogenous deposit of clay).

B. For mental tasks it’s usual to said, that a person can handle only 7 items at a time (if you disagree multiply by 2-3). But when you ride a bike you process more inputs at the same time (you hear a car behind you, you see person on the right, you feel your balance, you anticipate your direction, if you feel strong wind or sun on your face you probably squint your eyes, you take a breath of air. On top of that all the processes of your body adjust and support your riding: heart, liver, stomach…)

C. “Spherical cows” in physics. (Google this if needed)

breuleux 289 days ago

> Why does our intelligence only operate on simplified forms?

Part of the issue with discussing this is that our understanding of complexity is subjective and adapted to our own capabilities. But the gist of it is that the difficulty of modelling and predicting the behavior of a system scales very sharply with its complexity. At the end of the scale, chaotic systems are basically unintelligible. Since modelling is the bread and butter of intelligence, any action that makes the environment more predictable has outsized utility. Someone else gave pretty good examples, but I think it's generally obvious when you observe how "symbolic-smart" people think (engineers, rationalists, autistic people, etc.) They try to remove as many uncontrolled sources of complexity as possible. And they will rage against those that cannot be removed, if they don't flat out pretend they don't exist. Because in order to realize their goals, they need to prove things about these systems, and it doesn't take much before that becomes intractable.

One example of a system that I suspect to be intractable is human society itself. It is made out of intelligent entities, but as a whole I don't think it is intelligent, or that it has any overarching intent. It is insanely complex, however, and our attempts to model its behavior do not exactly have a good record. We can certainly model what would happen if everybody did this or that (aka a simpler humanity), but everybody doesn't do this and that, so that's moot. I think it's an illuminating example of the limitations of symbolic intelligence: we can create technology (simple), but we have absolutely no idea what the long term consequences are (complex). Even when we do, we can't do anything about it. The system is too strong, it's like trying to flatten the tides.

> To me the natural test is how hard it is for each one to create the other.

I don't think so. We already observe that humans, the quintessential symbolic intelligences, have created symbolic intelligence before embodied intelligence. In and of itself, that's a compelling data point that embodied is harder. And it appears likely that if LLMs were tasked to create symbolic intelligences, even assuming no access to previous research, they would recreate themselves faster than they would create embodied intelligences. Possibly they would do so faster than evolution, but I don't see why that matters, if they also happen to recreate symbolic intelligence even faster than that. In other words, if symbolic is harder... how the hell did we get there so quick? You see what I mean? It doesn't add up.

On a related note, I'd like to point out an additional subtlety regarding intelligence. Intelligence (unlike, say, evolution) has goals and it creates things to further these goals. So you create a new synthetic life. That's cool. But do you control it? Does it realize your intent? That's the hard part. That's the chief limitation of intelligence. Creating stuff that is provably aligned with your goals. If you don't care what happens, sure, you can copy evolution, you can copy other methods, you can create literally anything, perhaps very quickly, but that's... not smart. If we create synthetic life that eats the universe, that's not an achievement, that's a failure mode. (And if it faithfully realizes our intent then yeah I'm impressed.)

Nevermark 288 days ago

I think a lot of this is true, but not as critical as is being interpreted.

Compare the economics of purely cognitive AI to in-world robotics AI.

Pure cognitive: Massive scale systems for fast, frictionless and incredibly efficient cognitive system deployment and distribution of benefits are solved. On tap even. Cloud computing and the Internet.

What is the amortized cost per task? Almost nothing.

In-world: The cost of extracting raw resources, parts chain, material process chain, manufacturing, distributing, maintaining, etc.

Then what is the amortized cost per task, for one robot?

Several orders of magnitude more expensive, per task! There is no comparison.

Doing that profitably isn’t going to be the norm for many years.

At what price does a kitchen robot make sense? Not at $1,000,000. “Only $100,000?” “Only $25,000? “Only $10k”? Lower than that?

Compared to a Claude plan? That many people still turn down just to use free tier?

Long before general house helper robots makes any economic sense, we will have had walking talking, socializing, profitable-to-build sex robots at higher price points for price insensitive owners.

There are people who will pay high prices for that, when costs come down.

That will be the canary for general robotic servants or helpers.

The cost isn’t intelligence. There isn’t a particular challenge with in-world information processing and control. It’s the cost of the physical thing that processing happens in.

This is a purely economic problem. Not an AI problem at all.

programjames 289 days ago

It took about the same amount of time to evolve human-level intelligence as human-level mobility. Pretty much no other animal walks on two legs...

trescenzi 289 days ago

This is interesting to think about. It’s basically just birds and primates. Birds have an ancient evolutionary tree as they are dinosaurs, which did actually walk on two legs. But the gap between dinos and primates walking on two feet, I think, is tens of millions of years. So yea pretty long time.

noduerme 289 days ago

This makes me think something else, though. Once we were able to reason about the physics behind the way things can move, we invented wheels. From there it's a few thousand years to steam engines and a couple hundred more years to jet planes and space travel.

We may have needed a billion years of evolution from a cell swimming around to a bipedal organism. But we are no longer speed limited by evolution. Is there any reason we couldn't teach a sufficiently intelligent disembodied mind the same physics and let it pick up where we left off?

I like the notion of the LLM's understanding being "shadows on the wall of Plato's cave metaphor," and language may be just that. But math and physics can describe the world much more precisely and, of you pair them with the linguistic descriptors, a wall shadow is not very different from what we perceive with out own senses and learn to navigate.

breuleux 289 days ago

Note that wheels, steam engines, jet planes, spaceships wouldn't survive on their own in nature. Compared to natural structures, they are very simple, very straightforward. And while biological organisms are adapted to survive or thrive in complicated, ever-changing ecosystems, our machines thrive in sanitized environments. Wheels thrive on flat surfaces like roads, jet planes thrive in empty air devoid of trees, and so on. We ensure these conditions are met, and so far, pretty much none of our technology would survive without us. All this to say, we're playing a completely different game from evolution. A much, much easier game. Apples and oranges.

As for limits, in my opinion, there are a few limits human intelligence has that evolution doesn't. For example, intent is a double-edged sword: it is extremely effective if the environment can be accurately modelled and predicted, but if it can't be, it's useless. Intelligence is limited by chaos and the real world is chaotic: every little variation will eventually snowball into large scale consequences. "Eventually" is the key word here, as it takes time, and different systems have different sensitivities, but the point is that every measure has a half-life of sorts. It doesn't matter if you know the fundamentals of how physics work, it's not like you can simulate physics, using physics, faster than physics. Every model must be approximate and therefore has a finite horizon in which its predictions are valid. The question is how long. The better we are at controlling the environment so that it stays in a specific regime, the more effective we can be, but I don't think it's likely we can do this indefinitely. Eventually, chaos overpowers everything and nothing can be done.

Evolution, of course, having no intent, just does whatever it does, including things no intelligence would ever do because it could never prove to its satisfaction that it would help realize its intent.

baq 289 days ago

Something that isn’t obvious when we’re talking about the invention of the wheel: we aren’t actually talking about the round shape thing, we’re actually talking about the invention of the axle which allowed mounting a stationary cart on moving wheels.

card_zero 289 days ago

Yes, only humans, birds, sifakas, pangolins, kangaroos, and giant ground sloths. Only those six groups of creatures, and various lizards including the Jesus lizard which is bipedal on water, just those seven groups and sometimes goats and bears.

trescenzi 289 days ago

I get what you mean, that’s why the basically is there. Most, kangaroos and some lemurs in your list being the exception, do not move around primarily as bipeds. The ability to walk on two legs occasionally is different than genuinely having two legs and two arms.

coldtea 289 days ago

And once every while, my cat.

coldtea 289 days ago

Human-level mobility however is not much to write home about. Just one more variation of the many types seen in animals.

Human level intelligence is, otoh, qualitatively and quantitatively a bigger deal.

oblio 289 days ago

I wouldn't agree completely. Being bipedal frees up the hands for, anything, really.

We're better than most animals because we have tools. We have great tools because we have hands.

imtringued 289 days ago

Birds? Bears whose front paws got injured? https://youtu.be/kcIkQaLJ9r8

oblio 289 days ago

Birds didn't develop hands, neither did bears. Also bears can't walk 100km on their hind legs, but we can.

delusional 289 days ago

Talking about "time to evolve something" seems patently absurd and unscientific to me. All of nature evolved simultaneously. Nature didn't first make the human body and then go "that's perfect for filling the dishwasher, now to make it talk amongst itself" and then evolve intelligence. It all evolved at the same time, in conjunction.

You cannot separate the mind and the body. They are the same physiological and material entity. Trying anyway is of course classic western canon.

coldtea 289 days ago

>Nature didn't first make the human body and then go "that's perfect for filling the dishwasher, now to make it talk amongst itself" and then evolve intelligence. It all evolved at the same time, in conjunction.

Nature didn't make decisions about anything.

But it also absolutely didn't "all evolved at the same time, in conjunction" (if by that you mean all features, regarding body and intelligence, at the same rate).

>You cannot separate the mind and the body. They are the same physiological and material entity

The substrate is. Doesn't mean the nature of abstract thinking is the same as the nature of the body, in the same way the software as algorithm is not the same as hardware, even if it can only run on hardware.

But to the point: this is not about separating the "mind and the body". It's about how you can have humanoid form and all the typical human body functions for millions of years before you get human level intelligence, after many later evolution.

>Trying anyway is of course classic western canon.

It's also classic eastern canon, and several others besides.

delusional 288 days ago

> The substrate is. Doesn't mean the nature of abstract thinking is the same as the nature of the body, in the same way the software as algorithm is not the same as hardware, even if it can only run on hardware.

In this you are positing the existance of a _soul_ that exists separately from the body, and is portable amongst bodies. Analogues to how an algorithm (disembodied software) exists outside of the hardware and is portable amongst it (by embodying it as software).

I don't not agree with that at all, but it's impossible to know of you're right, but I can at least understand why you have a hard time with my argument and the east-west difference if tradition of the existance of a soul is that "obvious" to you.

giardini 289 days ago

Plato's "Allegory of the cave" was uninteresting and uninformative when I first read it more than 50 years ago. It remains so today.

https://en.wikipedia.org/wiki/Allegory_of_the_cave

Also, other than in sculpture/dentistry/medicine I also find "ablation" to not be a particularly insightful metaphor either. Although I see ablation's application to LLMs I simply had to laugh when I first read about it: I envisioned starting with a Greyhound bus and blowing off parts until it was a Lotus 7 sports car!8-). Good luck with that! Kind of like fixing the TV set by kicking it (but it _does_ work sometimes!).

Perhaps we should refrain somewhat from applying metaphors/simile/allegories to describe LLMs relative to human intelligence unless they provide some insight of significant value.

coldtea 289 days ago

>Plato's "Allegory of the cave" was uninteresting and uninformative when I first read it more than 50 years ago. It remains so today.

Anything can be uninteresting and uninformative when one doesn't see it's interestingness or can't grok its information.

It however stood for millenia as a great device to describe multiple layers of abstractions, deeper reality vs appearance, and so on, with utility as such in countless domains.

giardini 289 days ago

No. the Allegory is a fragment of a poor unfinished story and little more. You don't need it to explain "multiple layers of abstractions, deeper reality vs appearance" as you say. In fact, you don't need it for anything at all except to explain Plato's "Allegory of the cave". Sheesh.

coldtea says "...with utility as such in countless domains." So when's the last time you referred to the "Allegory of the cave" in your day, other than on HN?

coldtea 289 days ago

>So when's the last time you referred to the "Allegory of the cave" in your day, other than on HN?

Several times. But it was with broadly educated people, not over-specialized one-dimensional ones.

taneq 289 days ago

I don’t think that’s what ablation is about. It’s more like blowing parts off a bus until it ceases to be a bus. Then you find the minimal set of bus parts required to still be a bus, and that’s an indication that those parts are important to the central task of being a bus.

giardini 289 days ago

taneq SAYS "i don’t think that’s what ablation is about. It’s more like blowing parts off a bus until it ceases to be a bus."

Different people have different goals. You want some form of minimal bus and I want a Lotus 7. There's no guarantee either of us reach our goal.

Ablation is about disassembling something randomly, whether little by little or on an arbitrary scale until [SOMETHING INTERESTING OR DESIRABLE HAPPENS].

https://en.wikipedia.org/wiki/Ablation_(artificial_intellige...

Ablation is laughable but sometimes useful. It is also easy, mostly brainless, NOT guaranteed to provide any useful information (so you've an excuse for the wasted resources), and occasionally provides insight. It's a good tool for software engineers who have no (or seek no) understanding of their system, so I think of ablation as a "last resort" solutions (e.g., another being to randomly modify code until it "works") that I disdain.

But I'm old so I'm probably wrong! Burn those CPU towers down, boys and girls!

dragonwriter 289 days ago

> 10+ years ago I expected we would get AI that would impact blue collar work long before AI that impacted white collar work.

We did.

Like, to the point that the AI that radically impacted blue collar work isn't even part of what is considered “AI” any more.

mikeyouse 289 days ago

I think it's Benedict Evans who frequently posts about 'blue collar' AI work not looking like humanoid robots but instead Amazon fulfillment centers keeping track of millions of individual items or tomato picking robots with MV cameras only keeping the ripe ones as it picks at absurd rates.

There are endless corners of the physical world right now where it's not worth automating a task if you need to assign an engineer and develop a software competency as a manufacturing or retail company, but would absolutely be worth it if you had a generalizable model that you could point-and-shoot at them.

noduerme 289 days ago

Or a generalized model to develop them in a virtual sandbox before deploying them physically, which I think is more likely.

don_esteban 288 days ago

I think the bottleneck for this is still the cost of the physical hw of the robot, and its maintenance.

You need a fairly robust one that needs little maintenance, with a multitude of good sensors and precise actuators to be even remotely useful for sufficiently wide range of tasks (so that you have economy of scales). None of that comes cheap.

chrchr 289 days ago

Part of the answer to this puzzle is that your dishwasher itself is a robot that washes dishes, and has had enormous impact on blue collar jobs since its invention and widespread deployment. There are tons of labor saving devices out there doing blue collar work that we don't think of as robots or as AI.

kushalc 289 days ago

Not a robotics guy, but to extent that the same fundamentals hold—

I think it's a degrees of freedom question. Given the (relatively) low conditional entropy of natural language, there aren't actually that many degrees of (true) freedom. On the other hand, in the real world, there are massively more degrees of freedom both in general (3 dimensions, 6 degrees of movement per joint, M joints, continuous vs. discrete space, etc.) and also given the path dependence of actions, the non-standardized nature of actuators, actuators, kinematics, etc.

All in, you get crushed by the curse of dimensionality. Given N degrees of true freedom, you need O(exp(N)) data points to achieve the same performance. Folks do a bunch of clever things to address that dimensionality explosion, but I think the overly reductionist point still stands: although the real world is theoretically verifiable (and theoretically could produce infinite data), in practice we currently have exponentially less real-world data for an exponentially harder problem.

Real roboticists should chime in...

jandrewrogers 289 days ago

This understates the complexity of the problem. I have built a career modeling/learning entity behavior in the physical world at scale. Language is almost a trivial case by comparison.

Even the existence of most relationships in the physical world can only be inferred, never mind dimensionality. The correlations are often weak unless you are able to work with data sets that far exceed the entire corpus of all human text, and sometimes not even then. Language has relatively unambiguous structure that simply isn't the norm in real space-time data models. In some cases we can't unambiguously resolve causality and temporal ordering in the physical world. Human brains aren't fussed by this.

There is a powerful litmus test for things "AI" can do. Theoretically, indexing and learning are equivalent problems. There are many practical data models for which no scalable indexing algorithm exists in literature. This has an almost perfect overlap with data models that current AI tech is demonstrably incapable of learning. A company with novel AI tech that can learn a hard data model can demonstrate a zero-knowledge proof of capability by qualitatively improving indexing performance of said data models at scale.

Synthetic "world models" so thoroughly nerf the computer science problem that they won't translate to anything real.

noduerme 289 days ago

But we don't need to know all the things that could happen if M joints moved in every possible way at the same time. We operate within normal constraints. When you see someone trip on a sidewalk and recover before falling on their face, that's still a physical system taking signals and suggesting corrections that could be simulated in a relatively straightforward newtonian virtual reality, and trained a billion times on with however many virtual joints and actuators.

In terms of "world building", it makes sense for the "world" to not be dreamed up by an AI, but to have hard deterministic limits to bump up against in training.

I guess what I mean is that humans in the world constantly face a lot of conditions that can lead to undefined behavior as well, but 99% of the time not falling on your face is good enough to get you a job washing dishes.

amelius 289 days ago

In other words, self driving cars and robot vacuum cleaners cannot exist. Hmm.

oblio 289 days ago

LOL. Both of those are very limited and work in 2D spaces in highly constrained environments especially designed for them.

FloorEgg 289 days ago

Also not a robotics guy, but that all sounds right to me...

What I do have deep experience in is market abstractions and jobs to be done theory. There are so many ways to describe intent, and it's extremely hard to describe intent precisely. So in addition to all the dimensions you brought up that relate to physical space, there is also the hard problem of mapping user intent to action with minimal "error", especially since the errors can have big consequences in the physical world. In other words, the "intent space" also has many dimensions to it, far beyond what LLMs can currently handle.

On one end of the spectrum of consequences is the robot loads my dishwasher such that there is too much overlap and a bunch of the dishes don't get cleaned (what I really want is for the dishes to be clean, not for the dishes to be in the dishwasher), and on the other end we get the robot that overpowers humanity and turns the universe into paperclips.

So maybe we have to master LLMs and probably a whole other paradigm before robots can really be general purpose and useful.

simne 289 days ago

As I could see, classic methods (used in children teaching) could create at least magnitude more data than we have now, just paraphrasing text (classic NLP), but depends on language (I'll try explain).

Text really have lot of degrees of freedom, but depends on language, and even more on type of alphabet - modern English with phonetic alphabet is worst choice, because it is simplest, nearly nobody use second-third hidden meaning (I hear about 2-3 to 5-6 meanings depending on source); hieroglyphic languages are much more information rich (10-22 meanings); and what is interest, phonetic languages in totalitarian countries (like Russian) are also much more rich (8-12 meanings), because they used to hide few meanings from government to avoid punishment.

Language difference (more dimensions) could be explanation of current achievements of China, superior to Western, and it could also be hint, on how to boost Western achievements - I mean, use more scientists from Eastern Europe and give more attention to Eastern European languages.

For 3D robots, I see only one way - computational simulated environment.

Earw0rm 289 days ago

Autonomous vehicles are an interesting subset.

Even though the system rules and I/O are tightly constrained, they're still struggling to match human performance in an open-world scenario, after a gigantic R&D investment with a crystal clear path to return.

Fifteen years ago I thought that'd be a robustly solved problem by now. It's getting there, but I think I'll still need to invest in driving lessons for my teenage kids. Which is pretty annoying, honestly: expensive, dangerous for a newly qualified driver, and a massive waste of time that could be used for better things. (OK, track days and mountain passes are fun. 99% of driving is just boring, unnecessary suckage).

What's notable: AVs have vastly better sensors than humans, masses of compute, potentially 10X reaction speed. What they struggle with is nuance and complexity.

Also, AVs don't have to solve the exact same problems as a human driver. For example, parking lots: they don't need to figure out echelon parking or multi-storey lots, they can drop their passengers and drive somewhere else further away to park.

criddell 289 days ago

> in practice we currently have exponentially less real-world data for an exponentially harder problem

Is that where learning comes in? Any actual AGI machine will be able to learn. We should be able to buy a robot that comes ready to learn and we teach it all the things we want it to do. That might mean a lot of broken dishes at first, but it's about what you would expect if you were to ask a toddler to load your dishes into the dishwasher.

My personal bar for when we reach actual AGI is when it can be put in a robot body that can navigate our world, understand spatial relationships, and can learn from ordinary people.

mbac32768 289 days ago

We think this because ten years ago we were all having our minds blown by DeepMind's game playing achievements and videos of dancing robots and thought this meant blue collar work would be solved imminently.

But most of these solutions were more crude than they let on, and you wouldn't really know unless you were working in AI already.

Watch John Carmack's recent talk at Upper Bound if you want him to see him destroy like a trillion dollars worth of AI hype.

https://m.youtube.com/watch?v=rQ-An5bhkrs&t=11303s&pp=2AGnWJ...

Spoiler: we're nowhere close to AGI

Hendrikto 289 days ago

> But most of these solutions were more crude than they let on, and you wouldn't really know unless you were working in AI already.

Same with LLMs. Despite having seen this play out before, and being aware of this, people are falling for it again.

uncircle 289 days ago

Thank you for this update. I vividly remember a few years ago the excitement of John Carmack announcing he was retreating into his cave to do some deep work on AGI, pushing the boundaries of the current AI research. I truly appreciate Carmack's intellectual honesty now at announcing "yeah, no, LLMs are not the way to go to recreate anything remotely close to human intelligence.". In fact, and I quote him, "we do not even have a line of sight to [the fundamentals of intelligence]."

I'm honestly relieved that one of the brightest minds in computing, with all the resources and desire to create actual super-intelligences, has had to temper hard his expectations.

hackinthebochs 289 days ago

I don't think that quote from Carmack represents some deeply considered conclusion. He started off his efforts with embodiment. He either never considered LLMs a path towards AGI, or thought he didn't personally have anything to contribute to LLMs (he talked about it early on in his journey but I don't remember the specifics). He didn't spend a year investigating LLMs and then decide that they weren't the path to AGI. The point is that he has no special insight regarding LLMs relationship to AGI and its misleading to imply that his current effort towards building AGI that eschew LLMs is an expert opinion.

uncircle 289 days ago

Yes, I meant to say that, for Carmack, no type of modern AI research has figured out the path to actual general intelligence. I just didn't want to use the meaningless "AI" buzzword, and these days all the focus and money is on large language models, especially when talking about the end goal of AGI.

zer00eyz 289 days ago

> 10+ years ago I expected we would get AI that would impact blue collar work long before AI that impacted white collar work. Not sure exactly where I got the impression, but I remember some "rising tide of AI" analogy and graphic that had artists and scientists positioned on the high ground.

The moment you strip away the magical thinking, the humanization (bugs not hallucinations) what you realize is that this is just progress. Ford in the 1960's putting in the first robot arms vs auto manufacturing today. The phone: from switch board operators, to mechanical switching to digital to... (I think phone is in some odd hybrid era with text but only time will tell). Draftsmen in the 1970's all replaced by autocad by the 90's. GO further back to 1920, 30 percent of Americans were farmers, today thats less than 2.

Humans, on very human scales are very good at finding all new ways of making ourselves "busy" and "productive".

ACCount37 289 days ago

The big robot AI issue is: no data!

There is a lot of high quality text from diverse domains, there's a lot of audio or images or videos around. The largest robotics datasets are absolutely pathetic in size compared to that. We didn't collect or stockpile the right data in advance. Embodiment may be hard by itself, but doing embodiment in this data-barren wasteland is living hell.

So you throw everything but the kitchen sink at the problem. You pre-train on non-robotics data to squeeze transfer learning for all its worth, you run hard sims, a hundred flavors of data augmentation, you get hardware and set up actual warehouses with test benches where robots try their hand at specific tasks to collect more data.

And all of that combined only gets you to "meh" real world performance - slow, flaky, fairly brittle, and on relatively narrow tasks. Often good enough for an impressive demo, but not good enough to replace human workers yet.

There's a reason why a lot of those bleeding edge AI powered robots are designed for and ship with either teleoperation capabilities, or demonstration-replay capabilities. Companies that are doing this hope to start pushing units first, and then use human operators to start building up some of the "real world" datasets they need to actually train those robots to be more capable of autonomous operation.

Having to deal with Capital H Hardware is the big non-AI issue. You can push ChatGPT to 100 million devices, as long as you have a product people want to use for the price of "free", and the GPUs to deal with inference demand. You can't materialize 100 million actual physical robot bodies out of nowhere for free, GPUs or no GPUs. Scaling up is hard and expensive.

Hendrikto 289 days ago

> And all of that combined only gets you to "meh" real world performance - slow, flaky, fairly brittle, and on relatively narrow tasks. Often good enough for an impressive demo, but not good enough to replace human workers yet.

Sounds like LLMs to me.

ACCount37 289 days ago

It's like GPT-3.5 - a proof-of-concept tech demo more than a product.

I don't think further improvements are impossible, not at all. They're just hard to get at.

api 289 days ago

Embodiment is 1000x harder from a physical perspective.

Look at how hard it is for us to make reliable laptop hinges or the articulated car door handle trend (started by Tesla) where they constantly break.

These are simple mechanisms compared to any animal or human body. Our bodies last up to 80-100 years through not just constant regeneration but organic super-materials that rival anything synthetic in terms of durability within its spec range. Nature is full of this, like spider silk much stronger than steel or joints that can take repeated impacts for decades. This is what hundreds of millions to billions of years of evolution gets you.

We can build robots this good but they are expensive, so expensive that just hiring someone to do it manually is cheaper. So the problem is that good quality robots are still much more expensive than human labor.

The only areas where robots have replaced human labor is where the economics work, like huge volume manufacturing, or where humans can’t easily go or can’t perform. The latter includes tasks like lifting and moving things thousands of times larger than humans can or environments like high temperatures, deep space, the bottom of the ocean, radioactive environments, etc.

bflesch 289 days ago

The problem is not the robot loading the diswasher, it is the dishwasher. The dishwasher (and general kitchen electronics) industry has not innovated in a long time.

My prediction is a new player will come in who vertically integrates these currently disjoint industries and product. The tableware used should be compatible with the dishwasher, the packaging of my groceries should be compatible with the cooking system. Like a mini-factory.

But current vendors have no financial incentive to do so, because if you take a step back the whole notion of putting one room of your apartment full with random electronics just to cook a meal once in a blue moon is deeply inefficient. End-to-end food automation is coming to the restaurant business, and I hope it pushes prices of meals so far down that having a dedicated room for a kitchen in the apartment is simply not worth it.

That's the "utopia" version of things.

In reality, we see prices for fast food (the most automated food business) going up while quality is going down. Does it make the established players more vulnerable to disruption? I think so.

lambdaone 289 days ago

This exists already in the form of "ready meals" a.k.a. TV dinners. Fast food shops are already substantially mechanised; huge effots have been made to robotize cooking, but people are still cheaper to hire. It's still nowhere near the quality of home-cooked food.

bflesch 289 days ago

Yes, there are a lot of garbage microwave food offerings, especially popular with the US population. As a European I'm talking about quality food made with an automated process and end-to-end automation, including ingredient procurement and cleanup.

Not in competition with trash food but with proper food and local ingredients.

criddell 289 days ago

> the whole notion of putting one room of your apartment full with random electronics just to cook a meal once in a blue moon is deeply inefficient

You don't use your kitchen? After the rooms we sleep in, the kitchen is probably the most used space in my home. We are planning an upcoming renovation of our home and the kitchen is where we plan on spending the most money.

> The tableware used should be compatible with the dishwasher

Aside from non-dishwasher safe items, what tableware is incompatible with a dishwasher?

bflesch 289 days ago

Yes, of course I use it a lot. It is a great hobby. But only use it because it is kind of forced upon us. It's just so inefficient nowadays. Cooking used to be for the whole homestead or for the large family. Now it is mostly only for the immediate family. All the machines are not utilized properly. When people discussed car sharing it was exactly the same argument and I feel it also applies to kitchens.

With the "tableware" argument I meant something like a standardized (magnetic?) adapter for grabbing plates, forks and knives so they can easily be moved by machines/robots.

I feel a company like Ikea is perfectly set up to make this idea a reality, but they'll never do so because they make much more money when every single household buys all these appliances and items for their own kitchen.

Just from the perspective of a single household in a densely populated city I think it'd be nice to have freshly cooked, reproducibly prepared meals with high-quality ingredients available to me. Like an automated soup kitchen with cleanup. Without all the layers of plastic wrapping needed to move produce from large-scale distributors into single-household fridges and so on.

criddell 289 days ago

I think what a lot of people missed when they were talking about shared cars a few years ago is that people seem to mostly like their cars. They spend far more on them than they need to. The average price for a new car is almost $50k now when a vehicle costing half that would satisfy most people's needs.

I'm guessing people mostly overspend on kitchens as well. When our renovation happens, I'm sure we will and I'll feel pretty good about it.

For cars and kitchens, utilization considerations seem to be ranked way, way below things like comfort and convenience and beauty.

petralithic 289 days ago

> 10+ years ago I expected we would get AI that would impact blue collar work long before AI that impacted white collar work.

I'm not sure where people get this impression from, even back decades ago. Hardware is always harder than software. We had chess engines in the 20th century but a robotic hand that could move pieces? That was obviously not as easy because dealing with the physical world always has issues that dealing with the virtual doesn't.

foxglacier 289 days ago

Robots are only harder because they have expensive hardware. We already have robots that can load dishwashers and do other manual work but humans are cheaper so there isn't much of a market for them.

The rising tide idea came from a 1997 paper by Moravec. Here's a nice graphic and subsequent history https://lifearchitect.ai/flood/

Interestingly, Moravec also stated: "When the highest peaks are covered, there will be machines than can interact as intelligently as any human on any subject. The presence of minds in machines will then become self−evident." We pretty much have those today so by 1997 standards, machines have minds, yet somehow we moved the goalposts and decided that doesn't count anymore. Even if LLMs end up being strictly more capable than every human on every subject, I'm sure we'll find some new excuse why they don't have minds or aren't really intelligent.

ewoodrich 289 days ago

> Interestingly, Moravec also stated: "When the highest peaks are covered, there will be machines than can interact as intelligently as any human on any subject. The presence of minds in machines will then become self−evident

> We pretty much have those today so by 1997 standards, machines have minds, yet somehow we moved the goalposts and decided that doesn't count anymore

What you describe as "moving the goalposts" could also just be explained as simply not meeting the standard of "as intelligently as any human on any subject".

Even in the strongest possible example of LLM's strengths applying their encyclopedic knowledge and (more limited) ability to apply that knowledge for a given subject I don't think they meet that bar. Especially if we're comparing to a human over a time period greater than 30 minutes or so.

Quarrelsome 289 days ago

> if you can find a way to produce verifiable rewards about a target world

I feel like there's an interesting symmetry here between the pre and post LLM world, where I've always found that organisations over-optimise for things they can measure (e.g. balance sheets) and under-optimise for things they can't (e.g. developer productivity), which explains why its so hard to keep a software product up to date in an average org, as the natural pressure is to run it into the ground until a competitor suddenly displaces it.

So in a post LLM world, we have this gaping hole around things we either lack the data for, or as you say: lack the ability to produce verifiable rewards for. I wonder if similar patterns might play out as a consequence and what unmodelled, unrecorded, real-world things will be entirely ignored (perhaps to great detriment) because we simply lack a decent measure/verifiable-reward for it.

w10-1 289 days ago

> rather how we continue to scale for language models at the current LM frontier — 4-8h METR tasks

I wonder if this doesn't reify a particular business model, of creating a general model and then renting it out Saas-style (possibly adapted to largish customers).

It reminds me of the early excitement over mainframes, how their applications were limited by the rarity of access, and how vigorously those trained in those fine arts defended their superiority. They just couldn't compete with the hordes of smaller competitors getting into every niche.

It may instead be that customer data and use cases are both the most relevant and the most profitable. An AI that could adopt a small user model and track and apply user use cases would have entirely different structure, and would have demonstrable price/performance ratios.

This could mean if Apple or Google actually integrated AI into their devices, they could have a decisive advantage. Or perhaps there's a next generation of web applications that model use-cases and interactions. Indeed, Cursor and other IDE companies might have a leg up if they can drive towards modeling the context instead of just feeding it as intention to the generative LLM.

simne 289 days ago

> if you can find a way to produce verifiable rewards about a target world

I have significant experience on modelling physical world (mostly CFD, but also gamedev - with realistic rigid body collisions and friction).

I admit, exists domain (spectrum of parameters), where CFD and game physics working just well; exists predictable domain (on borders of well working domain), where CFD and game physics working good enough but could show strange things, and exists domain, where you will see lot of bugs.

And, current computing power is so much, that even on small business level (just median gamer desktop), we could save on more than 90% real-world tests with simulations in well working domain (and just avoid use cases in unreliable domains).

So I think, most question is just conservative bosses and investors, who don't believe to engineers and don't understand how to do checks (and tuning) of simulations with real world tests, and what reliable domain is.

olq_plo 289 days ago

Since you seem to know your stuff, why do LLMs need so much data anyway? Humans don't. Why can't we make models aware of their own uncertainty, e.g. feeding the variance of the next token distribution back into the model, as a foundation to guide their own learning. Maybe with that kind of signal, LLMs could develop 'curiosity' and 'rigorousness' and seek out the data that best refines them themselves. Let the AI make and test its own hypotheses, using formal mathematical systems, during training.

mikewarot 289 days ago

My focus lately is on the cost side of this. I believe strongly that it's possible to reduce the cost of compute for LLM type loads by 95% or more. Personally, it's been incredibly hard to get actual numbers for static and dynamic power in ASIC designs to be sure about this.

If I'm right (which I give a 50/50 odds to), and we can reduce the power of LLM computation by 95%, trillions can be saved in power bills, and we can break the need for Nvidia or other specialists, and get back to general purpose computation.

JumpCrisscross 289 days ago

> there are some cool labs building foundational robotics models, but they're maybe ~5 years behind LMs today

Wouldn't the Bitter Lesson be to invest in those models over trying to be clever about ekeing out a little more oomph from today's language models (and langue-based data)?

amelius 289 days ago

What do you mean by "verifiable rewards"?

Do you mean challenges for which the answer is known?

eab- 289 days ago

What do you mean about CLIP?

rawgabbit 289 days ago

I believe he is referring to OpenAI proposal to move beyond training with pure text. Instead train with multi modal data. Instead of only the dictionary definition of an apple. Train it with a picture of an apple. Train it with a video of someone eating an apple etc.

godshatter 289 days ago

Before this AI wave got going, I'd always assumed that AGI would be more about converting words, pictures, video, and lots of sensory data and who knows what else into a model of concepts that it would be putting together and hypothesizing about and testing as it grows. A database of what concepts have been learned and what data they were built from and what holes it needed to fill in. It would continually be working on this and reaching out to test reality or discuss it's findings with people or other AIs instead of waiting for input like a chatbot. I haven't even seen anything like this yet, just ways of faking it by getting better at stringing words together or mashing pixels together based on text tokens.

No one seems to be working on building an AI model that understands, to any real degree, what it's saying or what it's creating. Without this, I don't see how they can even get to AGI.

rawgabbit 289 days ago

When I was young, my relatives would make fun of me. Saying I had a lot of book learning but yet to experience the absurdity of the real world. Wait, they said, when I try to apply my fancy book learning to a world controlled by good ole boys, gatekeepers, and double talk. Then I will learn reality is different from the idealized world of books.

godelski 289 days ago

  > this isn't really about whether scaling is "dead"

I think there's a good position paper by Sara Hooker[0] that mentions some of this. Key point being that while the frontier is being pushed by big models with big data there's a very quiet revolution of models using far fewer parameters (still quite big) and data. Maybe "Scale Is All You Need"[1], but that doesn't mean it is practical or even a good approach. It's a shame these research paths have gotten a lot of pushback, especially given today's concerns about inference costs (this pushback still doesn't seem to be decreasing)

  > verifiable rewards

There's also a current conversation in the community over world models: is it actually a world model if the model does not recover /a physics/[2]. The argument for why they should recover a physics is that this means a counterfactual model must have been learned (no guarantees on if it is computationally irreducible). A counterfactual model gives far greater opportunities for robust generalization. In fact, you could even argue that the study of physics is the study of compression. In a sense, physics is the study of the computability of our universe[3]. Physics is counterfactual, allowing you to answer counterfactual questions like "What would the force have been if the mass had been 10x greater?" If this were not counterfactual we'd require different algorithms for different cases.

I'm in the recovery camp. Honestly I haven't heard a strong argument against it. Mostly "we just care that things work" which, frankly, isn't that the primary concern of all of us? I'm all for throwing shit at a wall and seeing what sticks, it can be a really efficient method sometimes (especially in early exploratory phases), but I doubt it is the most efficient way forward.

In my experience, having been a person who's created models that require magnitudes fewer resources for equivalent performance, I cannot stress enough the importance of quality over quantity. The tricky part is defining that quality.

[0] https://arxiv.org/abs/2407.05694

[1] Personally, I'm unconvinced. Despite success of our LLMs it's difficult to decouple other variables.

[2] The "a" is important here. There's not one physics per-say. There are different models. This is a level of metaphysics most people will not encounter and has many subtleties.

[3] I must stress that there's a huge difference between the universe being computable and the universe being a computation. The universe being computable does not mean we all live in a simulation.