Hacker News new | ask | show | jobs
by orbital-decay 698 days ago
This keeps emerging again an again, and the answers are pretty generic.

1. Large language models as a concept are not going anywhere anytime soon for any reason. Simply because there's no other source with a huge slice of human psyche encoded into it than the language itself and the corpus of texts in it. Humanity collectively did a massive amount of gradient descent on the language over generations, and it will stay as the primary source. That doesn't mean that other sources don't exist, of course.

2. Dataset quality matters at least as much as the architecture. There's plenty of low-hanging fruit available in preprocessing the data and "textbooks for models". You learn to count in a decimal system from both memorizing the number sequence and the explanation of the algorithm, not just by looking at millions of examples! There's plenty of a bit higher-hanging fruit available in hardware improvements and optimizations.

3. Calling a transformer a token predictor, stochastic parrot, autocomplete on steroids, etc. is of course right but kind of misses the point, like calling human brain a nerve impulse predictor (and the brain also has no "inherent way of verifying whether their predictions are correct", using the definition from the article). Reasoning about this in ill-defined terms like "understanding" or "knowledge" or "intelligence" is not useful at all. There are many differences between humans and LLMs, but the most high-level one is that humans are autonomous agents that exist in continuous time, and transformer's lifetime is the time required to compute a single token. Repeat the process for multiple tokens and you have something more complex. Add an external loopback, and you have a chatbot with memory, partly capable of doing things unexpected of a "word predictor". Make the loopback more complex, and you suddenly have an... autonomous system that exists in continuous time. Sure, it's extremely crude and primitive, and that loopback probably also needs to be replaced by something way more advanced in the future, and, and, and, and...

4. Reasoning and symbolic computation comparable to human abilities (which are also pretty spotty and error-prone) might or might not emerge as a result of scale and simple loopback mechanisms in models. You might or might not need an external symbolic engine as the author says, or maybe you can reduce it to another model of a different type, or maybe it's all wrong. Current models are still orders of magnitude smaller and simpler than the human nervous system, and plenty of things in LLMs already changed by simply increasing the scale.

5. Other than all of the above, sure - transformers or another flavor-of-the-year architecture might give way to more advanced ones. But the basic principles will remain, and language models are not going anywhere.

1 comments

> huge slice of human psyche encoded into it than the language itself and the corpus of texts in it

This type of wording is problematic because it conflates what is written as representative of our psyche when it does not.

Psyche implies thinking and thoughts that have occurred when the brain accessed those concepts from outside our 3D world, processed it internally, vocalized it into sounds and then finally letters.

LLMs are just doing pattern text search on top of what is written, it is doing no sort of reasoning or accessing of the hyper dimensional plane like our brain does when it thinks or reasons with concepts.

Our brains are not some exotic token or neurosymbolic search engines!

Differences in particular representations or the generation process are not very interesting, what matters is the stuff encoded in it. And as I said, calling a model a token predictor is right, but kind of misses the forest for the trees, it's an argument on a lower abstraction level that is not very useful.

The biological capabilities of a single human are also not very impressive, by the way. 90% (made up number) of what you consider your intelligence is actually the result of the biological evolution and social processes accumulating and abstracting the knowledge over endless generations. Hypothetical you raised without any contact with other humans, society, culture, education will be substantially different. So the processes are not just in your brain.

Whether you or me are doing "reasoning" is the matter of definition, and it's a really vague term. If you try to define it with more precision, you might come up with an idea that all we do is post-rationalizing the result of our blind prediction.

> This type of wording is problematic because it conflates what is written as representative of our psyche when it does not.

It definitely is representative, in some way. Human civilization did a huge amount of combined computation to encode the human behavior (personal, social, all kinds) into abstractions/semantics hidden in the language and text. Surely it can be recovered with some precision by statistical analysis and some computation. Which is what a large language model does.

Of course this "reverse engineering" approach has limitations. The model might not be able to generalize well enough to pick up higher level semantics. It might be architecture-limited. Some data might just not be in the dataset. The model will never be able to 100% copy humans without having an extremely precise biological reference, as well as you'll never be able to copy a dolphin, alien, or a model. But having an artificial human is not the point of this, and the achievable precision might be just good enough.

you are fixated on the output and what we can do within that limited set of data while ignoring the thought processes that was involved behind outputting that data as well as interpreting it.

without that "thinking" portion and simply mimicking to the point it resembles it while no such activity is happening (as I defined as accessing the conscious hyper dimensional cloud we humans can do easily).

intelligence in the english vocabulary is limited to retrieval which seems to be why there is so much push towards LLMs but this like trying to dance to a painting, you can interpret it as music and mimic dance moves but its different when a human hears the music and moves naturally.

That sounds... suspiciously like attributing inherent magic to humans. Who cares in practice that mechanisms under the hood are different if the output is the same? Why exactly are you sure humans do all these vaguely defined things inherently, and that it's not an emergent property? What even makes you think you do that? Individual capabilities are vastly overrated.

And the goal is not copying humans to begin with, just interfacing with them and be aligned just enough.