Hacker News new | ask | show | jobs
by necovek 22 days ago
You seem to be missing their point (which I agree with). The type of intelligence we are equipped with allows us not to have the level of memory an LLM does and still complete tasks that are novel to us every single day. Like navigating a shopping cart through tricky coridors in a store, coming up with a dad joke as in sibling example, combining a set of tools to achieve something we have never seen before, etc.

LLMs approximate a lot of that very well by simply having seen it before.

Also watch kids develop language: they learn patterns with much less training data than LLMs.

1 comments

I addressed much of this in my response to a sibling comment, but a few more here:

> novel to us every single day. Like navigating a shopping cart through tricky coridors in a store

We have been practicing navigating the physical world for something like 16hrs/day every day from the moment of our birth. All the sensory data passing through our brains during that time is far larger than any dataset an LLM is trained on.

Humans navigating a shopping cart at a store have likely navigated the physical world before, pushed a shopping cart before, and in combination have navigated stores while pushing shopping carts before. Nevertheless, many still bump into objects all along the way.

Them succeeding at successive variations of store layouts is not novel unless we expand the definition of novel to mean any recombination whatsoever of pre existing concepts.

I’m certain that with all the intense usage of AI by hundreds of millions of people, there have been countless collections of words passed to LLMs so far that have never before been uttered in exactly such a sequence, let alone in the dataset.

I’m equally certain the LLMs have responded to those words with collections of its own that have also never been uttered in that exact sequence, responding to their unique context.

It is trivial to produce an example of this now yourself if you’d like.

The LLM we’re talking about, mentioned in the OP, has never seen this solution to this problem in its dataset. A large number of brilliant mathematicians were not able to discover this solution. They are themselves expressing that this is a novel breakthrough and had this come from a human it would be treated as such.

If the response to that is “well it’s just recombining concepts it already knows until it finds a solution that works” I would ask how that differs from what humans do?

You missed the core of my point: humans operate, including in the real world, on much less training data. Give a human a shopping cart and ask them to push it backwards, and they'll figure it out in a few minutes even if they've never done it before.

This is the bit that's missing that LLMs do approximate amazingly well through sheer training set size, but in my opinion, it puts a cap on what novel things they can achieve in comparison with humans.

To me, I've thought about a related "invention space" before: with us creating software to solve many problems people are facing, why are there not any perfect solutions for any problem (running a cafe? a CNC machine? ...), and we always need more software built to cover one small (novel?) change for a particular owner?

The world space is just so large that you need whatever this intelligence is humans (and animals) have to navigate it successfully — but LLMs do not intrinsically.

Whether they can be so large that it does not matter in 99.99% of cases is to be seen.

> You missed the core of my point: humans operate, including in the real world, on much less training data.

I very specifically addressed this in my response to you. How much training data is contained in 16 waking hours of navigating the world fusing all sensory data, never mind data being simultaneously generated within the mind while this is all going on, from birth til death? From birth til pushing that shopping cart?

Far, far more than in all the training datasets being used for AI.

I also addressed this again in my reply to the sibling comment.

People tend to discount how much data humans have passing through their minds 24/7.

A human isn’t born in a vacuum as a fully formed adult and dropped into the shopping cart navigation problem.

A human has had far, far more training data fed into it that contains all the pieces necessary to translate to pushing a shopping cart when first seeing it, than a machine learning model which has been fed 1 million videos of a robot pushing a shopping cart.

I know I saw Geoffrey Hinton say humans operate with much less training data in a talk.

It doesn't strike me as a claim that should be controversial.

As far as I know nobody can train A.I. to push a shopping cart based on a human child's training set. It's mostly not relevant to the task.

Yeah I'm not sure what the exact context of the statement is.

I am absolutely certain that we have not already discovered let alone implemented the best possible learning algorithms. Humans have had more time to evolve, there's a great chance that we do learn more efficiently, and have developed specialized brains that are primed to learning things like how to navigate the physical world on planet Earth as bipeds.

That said, to say that we operate with less training data is just ignoring the reality of all the data we're training on at all times.

If we were to model in lossless fidelity what humans are capable of seeing, hearing, smelling, tasting, feeling, thinking consciously and subconsciously etc. essentially all the data flowing through our minds that we are constantly training on every moment of every day, even while we sleep/are unconscious, what sort of bitrate do you think would be required?

Modern LLMs train on datasets in the what, tens of terabytes in size? Let's call it 100 TB.

I would imagine that to losslessly reproduce the full suite of human sensory data (whatever that means for things like taste, touch, smell) would require a bitrate that hits that 100 TB total relatively quickly?

Let's stick to comparing language skills to language skills: at least in my experience with my two kids, they learn word formation patterns before they turn 2 — easy to notice because you see them make mistakes on exceptions.

LLMs needed how much training data to be able to do so?

FWIW, I still see them make up wrong words not following any grammatical pattern, esp in Serbian with less training data.

Serbian is pretty complex though: https://www.languagegrowth.com/en/blog/serbian-grammar-basic... — this made it even more surprising to see the kids pick them up so early when their vocabulary is probably not 2000 words yet.

Hinton says things like

"...we're optimized for having not many experiences. You only live for about a billion seconds—that's assuming you don't learn anything after you're 30, which is pretty much true. So you live for about a billion seconds and you've got a 100 trillion connections. So [you've] got crazily more parameters than you have experiences. So our brains [are] optimized for making the best use of not very many experiences."

I think this is disingenuous comparison. When we read a book we can estimate the amount of data we're taking in based on the character count (each character being represented by some fixed amount of bits).

What you're suggesting on the other hand is something akin to counting the number of pixels on each page we look at. That's absurd overestimate of the amount of data a person reading is actually taking in.