Hacker News new | ask | show | jobs
by lisper 785 days ago
> the combination of the two still seems worth investigating

This.

Back in the late 1980's and early 90's the debate-du-jour was between deliberative and reactive control systems for robots. I got my Ph.D. for simply saying that the entire debate was based on the false premise that it had to be one or the other, that each approach had its strengths and weaknesses, and that if you just put the two together the whole would be greater than the sum of its parts. (Well, it was a little more than that. I had to actually show that it worked, which was more work that simply advancing the hypothesis, but in retrospect it seems kinda obvious, doesn't it?)

If I were still in the game today, combining generative-AI and old-school symbolic reasoning (which has also advanced a lot in 30 years) would be the first thing I would focus my attention (!) on.

1 comments

People have advanced that argument a lot, and it's often worked for a short while; then the statistical models get better.

Chess was a game for humans.

It was very briefly a game for humans and machines (Kasparov had a go at getting "Advanced Chess" off the ground as a competitive sport), but soon enough having a human in the team made the program worse.

But at least the evaluation functions were designed by humans, right? That lasted a remarkably long time; first Stockfish became the strongest engine in the world by using distributed hyperparameter search to tweak its piece-square tables, then AlphaZero came along and used a policy network + MCTS instead of alpha-beta search, then (with an assist from the Shogi community) Stockfish struck back with a completely learned evaluation function via NNUE.

So the last frontier of human expertise in chess is search heuristics, and that's going to fall too: https://arxiv.org/abs/2402.04494.

The common theme with all of this is that the stuff which we used before are, fundamentally, hacks to get around _not having enough compute_, but which make the system worse once you don't have to make those tradeoffs around inductive biases. Empirical evidence suggests that raw scaling has a long way to run yet.

I find myself not wanting to agree with you, but deep down I think you're right.

AI greatly reminds me of the Library of Babel thought experiment. If we can imagine a library with every book that can possibly be written in any language, would it contain all human knowledge lost in a sea of noise? Is there merit or value in creating a system that sifts through such a library to attune hidden truths, or are we dooming ourselves to finding meaning in nothingness?

In a certain sense, there's immense value to developing concepts and ideas through intuition and thought. In another sense, a rose by any other name smells just as sweet; if an AI creates a perpetual motion device before a human does, that's not nothing. I don't expect AI to speed past human capability like some people do, but it's certainly displaced a lot of traditional computer-vision and text generation applications.

> If we can imagine a library with every book that can possibly be written in any language, would it contain all human knowledge lost in a sea of noise? Is there merit or value in creating a system that sifts through such a library to attune hidden truths, or are we dooming ourselves to finding meaning in nothingness?

The work that system would be required to find those "hidden truths" is equivalent to re-deriving those truths from scratch.

Similar argument: an image is just a number; if you take e.g. a 800x600 24bpp picture, that's a number 1 440 000 bytes long; you could hypothetically start from 0 and generate every 1 440 000-byte number, thus generating every possible 800x600 24bit image. In that set, you'd find every historical event, photographed at every moment from every angle, and even photos of every fragment of every book from the Library of Babel. But good luck finding anything particular in there.

Similar argument 2: any movie or song is contained somewhere within digital expansion of the number Pi. But again, it's worthless unless you know how to find such works, which basically requires you to have them in the first place.

> then the statistical models get better

Maybe. The statistical models are definitely better at natural language processing now, but they still fail on analytical tasks.

Of course, human brains are statistical models, so there's an existence proof that a sufficiently large statistical model is, well, sufficient. But that doesn't mean that you couldn't do better with an intelligently designed co-processor. Even humans do better with a pocket calculator, or even a sheet of paper, than they do with their unaided brains.

If human brains are statistical models, why are human brains so bad at statistics?

Edt: btw, same for probabilistic inference, same for logical inference, and same for any other thing anyone's tried as the one true path to AI since the 1950's. Humans have consistently proven bad at everything computers are good at, and that tells us nothing about why humans are good at anything (if, indeed, we are). Let's not assume too much about brains until we find the blueprint, eh?

> why are human brains so bad at statistics?

That depends on what you mean by being "bad at statistics." What brains do on a conscious level is very different than what they do at a neurobiological level. Brains are "bad at statistics" on the conscious level, but at the level of neurobiology that's all they do.

As an analogy, consider a professional tennis or baseball player. At the neurobiological level those people are extremely good at finding solutions to kinematic equations, but that doesn't mean that they would ace a physics test.

That is a very big assumption -that brains have conscious and subconscious levels that are good and bad at different things- that needs to be itself proved, before it can be used to support any other line of inquiry.

I'm not well versed in the relevant literature at all but my understanding is that research in the area points to the completely opposite direction: that humans e.g. playing baseball do not find solutions to kinematic equations, but instead use simple heuristics that exploit our senses and body configuration, like placing their hands in front of their eyes so that they line up with the ball etc.

This makes a lot more sense, not only for humans playing tennis, but for animals surviving in the wild, finding sustenance and shelter, and mates, while avoiding becoming a meal. Consider the Portia spider [1], a spider-hunting spider, itself prey to other hunting spiders, with a brain consisting of a few tens of thousands of neurons and still perfectly capable not only of navigating complex environments in all three space dimensions but also making complex plans involving detours.

Just think of how quickly a spider must be able to think that hunts, and is hunted by other spiders -some of the most deadly predators in the animal kingdom. There is no chance of a snowball in hell that such an animal has the time to solve kinematic equations with a few KBs of neurons. Absolutely no chance at all.

For that and many other stuff like that it looks very unlikely to me that human brains, or any brains, are like you say. In any case, that sounds positively Freudian and I don't mean that as an insult, but I so could.

______________

[1] My favourite. No, I don't mean meal. I just love this paper; it's almost the best paper in autonomous robotics and planning that I've ever read:

https://www.frontiersin.org/journals/psychology/articles/10....

> That is a very big assumption -that brains have conscious and subconscious levels that are good and bad at different things- that needs to be itself proved, before it can be used to support any other line of inquiry.

You can't be serious. Do you really doubt that hand-eye coordination and solving systems of kinematic equations on paper using math are disjoint skills? That one can be good at one without being good at the other? That there is in actual fact an inverse correlation between these skills? How do you account for the fact that even people who have never studied math or physics can learn to throw and catch a ball?

> That is a very big assumption -that brains have conscious and subconscious levels that are good and bad at different things- that needs to be itself proved, before it can be used to support any other line of inquiry.

Does this assumption itself need to be proven?

Besides, it's not true: you can simply define it as an assumption within a thought experiment and proceed merrily along, or you can just not bother to consider whether one's premises are true in the first place, and proceed merrily along.

The second option tends to be more popular in my experience, perhaps because it is so much easier, and perhaps for some other reasons also.

> If human brains are statistical models, why are human brains so bad at statistics?

If CPUs are made of silicon, why are they so bad at simulating semiconductors? Or why CPUs are so bad at emulating CPUs?

If JavaScript runs on a CPU, why is it so bad at doing bitwise stuff?

Etc.

What the runtime is made of is entirely separate of what's running on it. Same is with human brain (substrate) and human consciousness (software), or humans (substrate) and bureaucracy (runtime) and corporations (software).

Your question implies it is obvious that a system of statistical models would (or should) be good at statistics. And that the opposite is a paradox. I would ask why you think that is obvious?

Being good at statistics is more of a knowledge graph of understanding concepts than a statistical model, I think.

Just like understanding a car engine.

That's the "bitter lesson", right? Which is really a sour lesson- as in sour grapes. See, Rich Sutton's point with his Bitter Lesson is that encoding expert knowledge only improves performance temporarily, which is eventually surpassed by more data and compute.

There are only two problems with this: One, statistical machine learning systems have an extremely limited ability to encode expert knowledge. The language of continuous functions is alien to most humans and it's very difficult to encode one's intuitive, common sense knowledge into a system using that language [1]. That's what I mean when I say "sour grapes". Statistical machine learning folks can't use expert knowledge very well, so they pretend it's not needed.

Two, all the loud successes of statistical machine learning in the last couple of decades are closely tied to minutely specialised neural net architectures: CNNs for image classification, LSTMs for translation, Transformers for language, Difussion models and Ganns for image generation. If that's not encoding knowledge of a domain, what is?

Three, because of course three, despite point number two, performance keeps increasing only as data and compute increases. That's because the minutely specialised architectures in point number two are inefficient as all hell; the result of not having a good way to encode expert knowledge. Statistical machine learning folk make a virtue out of necessity and pretend that only being able to increase performance by increasing resources is some kind of achievement, whereas it's exactly the opposite: it is a clear demonstration that the capabilities of systems are not improving [2]. If capabilities were improving, we should see the number of examples required to train a state-of-the-art system either staying the same, or going down. Well, it ain't.

Of course the neural net [community] will complain that their systems have reached heights never before seen in classical AI, but that's an argument that can only be sustained by the ignorance of the continued progress in all the classical AI subjects such as planning and scheduling, SAT solving, verification, automated theorem proving and so on.

For example, and since planning is high on my priorities these days, see this video where the latest achievements in planning are discussed (from 2017).

https://youtu.be/g3lc8BxTPiU?si=LjoFITSI5sfRFjZI

See particularly around this point where he starts talking about the Rollout IW(1) symbolic planning algorithm that plays Atari from screen pixels with performance comparable to Deep-RL; except it does so online (i.e. no training, just reasoning on the fly):

https://youtu.be/g3lc8BxTPiU?si=33XSM6yK9hOlZJnf&t=1387

Bitter lesson my sweet little ass.

____________

[1] Gotta find where this paper was but none other than Vladimir Vapnik basically demonstrated this by trying the maddest experiment I've ever seen in machine learning: using poetry to improve a vision classifier. It didn't work. He's spent the last 20 years trying to find a good way to encode human knowledge into continuous functions. It doesn't work.

[2] In particular their capability for inductive generalisation which remains absolutely crap.

Yeah, that's one of the papers in that line of research by Vapnik. He's got a few with similar content. Visually, it's not the paper I remember, I'll have to read it again to be sure.

If I remember correctly, Vapnik's point is, we know that Big Data Deep Learning works; now, try to do the same thing with small data. Very much like my point that capabilities of models are not improving, only the scale increasing.

> The language of continuous functions is alien to most humans and it's very difficult to encode one's intuitive, common sense knowledge into a system using that language

In other words; machine learned models are octopus brains (https://www.scientificamerican.com/article/the-mind-of-an-oc...) and that creeps you out. Fair enough, it creeps me out too, and we should honour our emotions — I'm no rationalist – but we should also be aware of the risks of confusing our emotional responses with reality.

Please don't god mode me? Machine learning doesn't creep me out. I'm sorry it creeps you out. In my culture, octopus is a prized delicacy, my dad used to fish them out of the sea with his bare hands when I was a kid. If you wanna creep me out, you should try snake, not octopus.
>Two, all the loud successes of statistical machine learning in the last couple of decades are closely tied to minutely specialised neural net architectures: CNNs for image classification, LSTMs for translation, Transformers for vision, Difussion models and Ganns for image generation. If that's not encoding knowledge of a domain, what is?

Transformers, Diffusion for Vision, Image generation are really odd examples here. None of those architectures or training processes are tuned for Vision in mind lol. It was what? 3 years after Attention 2017 before the famous Vit paper. CNNs have lost a lot of favor to Vits, LSTMs are not the best performing translators today.

The bitter lesson is that less encoding of "expert" knowledge results in better performance and this has absolutely held up. The "encoding of knowledge" you call these architectures is nowhere near that of the GOFAI kind and even more than that, less biased NN architectures seem to be winning out.

>That's because the minutely specialised architectures in point number two are inefficient as all hell; the result of not having a good way to encode expert knowledge.

Inefficient is a whole lot better than can't even play the game, the story of GOFAI for the last few decades.

>If capabilities were improving, we should see the number of examples required to train a state-of-the-art system either staying the same, or going down. Well, they ain't.

The capabilities of models are certainly increasing. Even your example is blatantly wrong. Do you realize how much more data and compute it would take to train a Vanilla RNN to say GPT-3 level performance?

>> Inefficient is a whole lot better than can't even play the game, the story of GOFAI for the last few decades.

See e.g. my link above where GOFAI plays the game (Atari) very well indeed.

Also see Watson winning Jeopardy (a hybrid system, but mainly GOFAI - using frames and Prolog for knowledge extraction, encoding and retrieval).

And Deep Blue beating Kasparov. And MCTS still the SOTA search algo in Go etc.

And EURISCO playing Traveller as above.

And Pluribus playing Poker with expert game-playing knowledge.

And the recent neuro-symbolic DeepMind thingy that solves geometry problems from the maths olympiad.

etc. etc. [Gonna stop editing and adding more as they come to my mind here.]

And that's just playing games. As I say in my comment above planning and scheduling, SAT, constraints, verification, theorem proving- those are still dominated by classical systems and neural nets suck at them. Ask Yan LeCun: "Machine learning sucks". He means it sucks in all the things that classical AI does best and he means he wants to do them with neural nets, and of course he'll fail.

> And MCTS still the SOTA search algo in Go etc

It's often forgotten that Rich Sutton said the two things which work are learning (the AlphaGo/Leela Zero policy network) and search (MCTS). (I think the most interesting research in ML is around the circumstances in which large models wind up performing implicit search.)

Well, gradient optimisation is a form of search.
That was a figure of speech. I didn't literally mean games (not that GOFAI performs better than NNs in those games anyway). I simply went off your own examples - Vision, Image generation, Translation etc.

>As I say in my comment above planning and scheduling, SAT, constraints, verification, theorem proving- those are still dominated by classical systems

You can use NNs for all these things. It wouldn't make a lot of sense because GOFAI would be perfect and the former would be inefficient but you certainly could which is again more than I can say for GOFAI and the domains you listed.

I don't understand your comment. Clarify.

As it is, your comment seems to tell me that neural nets are good at neural net things and GOFAI is good at GOFAI things, which is obvious, and is what I'm saying: neural nets can make only very limited use of expert knowledge and so suck in all domains where domain knowledge is abundant and abundantly useful, which are the same domains where GOFAI dominates. GOFAI can make very good use of expert knowledge but is traditionally not as good in domains where only tacit knowledge is available, because we don't understand the domain well enough yet, like in anything to do with pattern recognition, which is the same domains where neural nets dominate. If explicit, expert knowledge was available for those domains, then GOFAI would dominate, and neural nets would fall behind, completely contrary to what Sutton thinks.

So, the bitter lesson is only bitter for those who are not interested in what classical AI systems can do best. For those of us who are, the lesson is sweet indeed: we're making progress, algorithmic progress, progress in understanding, scientific progress, and don't need to burn through thousands of credit to train on server farms to do anything of note. That's even a running joke in my team: hey, do you need any server time? Nah, I'll run the experiment on my laptop over lunch. And then beat the RL algo (PPO) that needs three days training on GPUs. To solve mazes badly.

Addendum:

>> Do you realize how much more data and compute it would take to train a Vanilla RNN to say GPT-3 level performance?

Oh, good point. And what would GPT-3 do with the typical amount of data used to train an LSTM? Rhetorical.

Yeah, all of those architectures are _themselves_ hacks to get around having insufficient compute! They absolutely were encoding inductive biases into the network to get around not being able to train enough, and transformers (handwaving hard enough to levitate, the currently-trainable model family with the least inductive bias) have eaten the world in all domains.

This is evidence _for_ the Bitter Lesson, not against it.

They haven't (eaten the world etc). They just happen to be the models that trend hard right now. I bet if you could compare like for like you'd be able to see some improvement in performance from Transformers, but that 'd be extremely hard to separate from the expected improvement from the constantly increasing amounts of data and compute. For example, you could, today, train a much bigger and deeper Multi-Layered Perceptron than you could thirty years ago, but nodoy is trying because that's so 1990's, and in any case they have the data and compute to train much bigger, much more inefficient (contrary to what you say if I got that right) architectures.

Wait a few years and the Next Big Thing in AI will come along, hot on the heels of the next generation of GPUs, or tensor units or whatever the hardware industry can cook up to sell shovels for the gold rush. By then, Transfomers will have hit the plateau of diminishing returns, there'll be gold in them there other hills and nobody would talk of LLMs anymore because that's so 2020s. We've been there so many times before.

> much more inefficient

The tricky part here is that "efficiency" is not a single dimension! Transformers are much more "efficient" in one sense, in that they appear to be able to absorb much more data before they saturate; they're in general less computationally efficient in that you can't exploit symmetries as hard, for example, at implementation time.

Let's talk about that in terms of a concrete example: the big inductive bias of CNNs for vision problems is that CNNs essentially presuppose that the model should be translation-invariant. This works great — speeds up training and makes it more stable – until it doesn't and that inductive bias starts limiting your performance, which is in the large-data limit.

Fully-connected NNs are more general than transformers, but they have _so many_ degrees of freedom that the numerical optimization problem is impractical. If someone figures out how to stabilize that training and make these implementable on current or future hardware, you're absolutely right that you'll see people use them. I don't think transformers are magic; you're entirely correct in saying that they're the current knee on the implementability/trainability curve, and that can easily shift given different unit economics.

I think one of the fundamental disconnects here is that people who come at AI from the perspective of logic down think of things very differently to people like me who come at it from thermodynamics _up_.

Modern machine learning is just "applications of maximum entropy", and to someone with a thermodynamics background, that's intuitively obvious (not necessarily correct! just obvious) –in a meaningful sense the _universe_ is a process of gradient descent, so "of course" the answer for some local domain models is maximum-entropy too. In that world view, the higher-order structure is _entirely emergent_. I'm, by training, a crystallographer, so the idea that you can get highly regular structure emerging from merciless application of a single principle is just baked into my worldview very deeply.

Someone who comes at things from the perspective of mathematical logic is going to find that worldview very weird, I suspect.