Hacker News new | ask | show | jobs
by 10GBps 19 days ago
Yep. It's nearly identical to the neural nets we were using in the 90s. Back then even a supercomputer wasn't big enough or fast enough to do what we do today.

I have to wonder though. Is this all a human brain is? A similar thing to an LLM just scaled exponentially larger. I mean a brain is not just neurons with simple connections to each other. The neurons, axons, dendrites, <insert_unexplained_thing>, etc in a brain are all holding and processing information in different ways and doing it nearly 100% in parallel. That's a really big model.

The biological discoveries show how complex a biological brain actually is. Even the tiny brains in a bee or spider are able to solve puzzles and use tools. That's crazy.

6 comments

No, it’s definitely not what a human brain is. That makes very little sense. The ways we interact with language (and thus conceptual memory) is completely and fundamentally different.
Is it different though?

If we look beyond written languages which are late inventions of human civilization, oral languages are continuous and build with blocks not words.

Chomskyan school misled the entire field of linguistics for decades by ignoring spoken languages.

It is different, but there may be some universal principles that are relevant more abstractly among both cases. Of particular interest is the empirical notion that statistical models of a certain form will always tend to "average out noise" and "learn meaningful patterns" up to the capacity that those models have for representing said patterns. A parallel notion to this is the hypothesis dubbed "thermodynamic origins of life". The universal principle binding these two seemingly disparate topics is one that seems to underlie any sense of "learning" in physical systems: that semantics of those systems depend on their representational power, and the semantics they do come to represent are the results of adding up many pushes in one "direction" (phase space / state space / etc.) encoding a pattern, and adding up many random noise jiggles will cancel out but give you a first-order sense of variance of those semantic features as expressed by the environment.

As this description is so overly abstract, an exercise for the reader is to try to work through an explanation of how, say, a river delta comes to "learn" about its environment by "reacting" to the influences at its borders, and how it "encodes" whatever it is that it learns in the substrate that it inhabits.

Chomsky did the opposite of what you're saying. He didn't ignore spoken language. He said that human vocalization is independent of language, and that the way our brains can manipulate and use sound (a cognitive capability, not specifically an aural one) is the fundamental differentiator that allows us to make compound ideas, and our specific use of language is a byproduct.

Example: a programming language's capability to produce complex software does not come from some inherent quality of language. It comes from binary. 0's and 1's, representing basic logic, and that being built on top of with an abstract "tool" called a language. If the binary logic didn't work, the language wouldn't do anything.

A dolphin can make sounds, and technically has a language, but they can't manipulate or recursively compound concepts (as far as we can tell) in order to create modified ideas. If they could, they probably would have come up with vastly more advanced fishing methods than the (admittedly novel) ones they have now.

But … how close a simulation is it. I can see why people are wondering
> I mean a brain is not just neurons with simple connections to each other.

No, it's not. There are many animals that have extremely complex and even learned behaviour that have literally zero neurons.

Clearly "neurons" is an oversimplification just-so story, not a scientific theory.

Apparently even single-celled protozoa can show learned trial and error behaviour.
Do you consider fungi animals or do you perhaps mean animals that don't have a brain/CNS?
Yes, protozoans don't have brains and yet they exhibit complex behavior.
In the 90s you didn't have norm layers, residuals, attention, and some more.

So you're missing a lot of the building blocks that make LLMs. It's not a matter of just having the compute.

I think the attention mechanism is so simple but so revolutionary that people forget it.

Like the best leaps in thinking, once it is made, is is immediately obvious and intuitive.

Yes, but it wasn't invented from nothing in 2017. Soft attention existed in other applications like information retrieval, Nonlocal networks had similar ideas as well. But it wasn't seen or used as a fundamental building block. But it wasn't something out of the blue either.
Almost everything in ML is like that. It seems so obvious in hindsight. It's maybe what I love most.

Residual connections are so simple, so obvious and so vital. Yet nobody came up with them until 2015?

I suspect it was considered many times, but the sheer computation scale would make it feel like obscene brute force. It feels like the right shape but too wild to think about implementing.

I think as time went on, and hardware got better, it seemed more reasonable to actually think about a viable implementation of what I think was a widespread intuition anyone in ML had that everything's context is everything.

It just seemed like a theoretical thing until hardware caught up. Maybe. Perhaps I'm applying a retrospective excuse to why it took so long.

People definitely wanted to train deep networks before, but didn't know how. They evdn tried things like training layers independently.

I don't think it was intuitive to anyone back then, the vanishing gradient problem was a big deal since the dawn of NNs. I'm not sure what you mean by sheer computation, residuals allow you to have deep networks instead of shallow and wide ones. You can have equivalent parameter count.

Attention layers were not used in the 90s.
Probably better to not simply reduce it by just saying X is Y then if it has all that extra complexity and capacity.
LLMs are semiotic infrastructure. You won’t find a better analogy. The cognitive frame won’t hold.