Hacker News new | ask | show | jobs
by xcv123 1139 days ago
> There's no thinking, no reasoning, no calculation, no logic, no deduction, no intelligence, no anything. It's only token, token, token.

False. The neural network inside the transformer LLM contains a hierarchical semantic model, and has inferred some rules of reasoning from the training set. It can apply those rules to new input.

There are semantic layers above the "token token tokens".

Explore them here: https://openaipublic.blob.core.windows.net/neuron-explainer/...

1 comments

What you're commenting here is simply repearting, without critical intent, the baseless claims connectionists have made about their systems for many decades. Similarlly those claims have been criticised but connectionsists simply ignore the criticisms and continue with the same old nonsense, as if nothing happened. For example, that ridiculous conceit that their systems have "neurons", or that the weights of functions in a neural net somehow represent semantic categories recognised by humans. These are all complete fantasies.

If you are not aware of the long history of debunking such fabrications, I suggest you start here:

Connectionism and Cognitive Architecture: A Critical Analysis

https://ruccs.rutgers.edu/images/personal-zenon-pylyshyn/pro...

We are talking about artificial neurons here. Not biological neurons. These are mathematical structures.

https://en.wikipedia.org/wiki/Artificial_neuron

These models infer semantic categories that correlate to categories within the human mind, to the extent that they can solve natural language understanding tasks.

No one is saying they are biological neurons, or that they model semantics exactly as the human mind would. It is mechanical pattern recognition that approximates our understanding.

You can browse those artificial neurons online and view their associations.

You're just saying words without ever explaining why. What am I supposed to do about that? There's nothing to argue with if you're just repeating nonsensical claims without even trying to support them.

For example:

>> It is mechanical pattern recognition that approximates our understanding.

That's just a claim and you're not even saying why you make it, what makes you think so, etc.

> That's just a claim and you're not even saying why you make it, what makes you think so, etc.

Mechanical - it is an algorithm, not a living being.

Pattern recognition - a branch of machine learning that focuses on the detection and identification of regularities and patterns in data. It involves classifying or categorizing input data into identifiable classes based on extracted features. The patterns recognized could be in various forms, such as visual patterns, speech patterns, or patterns in text data.

Approximates our understanding - meaning the model is not exactly the same as human understanding

When I say 'mechanical pattern recognition that approximates our understanding,' what I mean is that large language models (LLMs) like GPT-4 learn patterns from the vast amounts of text data they're trained on. These patterns correspond to various aspects of language and meaning.

For example, the models learn that the word 'cat' often appears in contexts related to animals, pets, and felines, and they learn that it's often associated with words like 'meow' or 'fur'. In this sense, the model 'understands' the concept of a cat to the extent that it can accurately predict and generate text about cats based on the patterns it has learned.

This isn't the same as human understanding, of course. Humans understand cats as living creatures with certain behaviors and physical characteristics, and we have personal experiences and emotions associated with cats. A language model doesn't have any of this - its 'understanding' is purely statistical and based on text patterns.

The evidence for these claims comes from the performance of these models on various tasks. They can generate coherent, contextually appropriate text, and they can answer questions, translate languages, and perform other language-related tasks with a high degree of accuracy. All of this suggests that they have learned meaningful patterns from their training data.

That is not "evidence" of anything. It's just assumptions. You keep saying what you think is going on without ever saying how or why. You are not describing any mechanisms and you are not explaining any observations.

I have a suggestion: try to convince yourself that you are wrong; not right. Science gives you the tools to know when you're wrong. If you're certain you're right about something then you're probably wrong and you should keep searching until you find where and how.

For example, try to trace in your mind the mechanisms and functionality of language models, and see where your assumptions about their abilities come from.

Good luck.

Your suggestion of trying to convince oneself of being wrong is a valuable one and reflects the scientific method. I agree that it's important to continually challenge and scrutinize our own beliefs and assumptions.

Let's delve deeper into the mechanics of language models. Large language models like GPT-4 use an architecture called transformers. This architecture is composed of layers of self-attention mechanisms, which allow the model to weigh the importance of each word in the input when predicting the next word.

When the model is trained, it adjusts the weights in its network to minimize the difference between its predictions and the actual words in its training data. This process is guided by a loss function and an optimization algorithm.

Through this training process, the model learns to represent words and phrases as high-dimensional vectors, also known as embeddings. These embeddings capture many aspects of the words' meanings, including their syntactic roles and their semantic similarities to other words.

When the model generates text, it uses these embeddings to choose the most likely next word given the previous words. This process is based on the patterns and regularities that the model has learned from its training data.

Of course, this is a high-level description and the actual process involves a lot of complex mathematics and computation. But I hope it gives you a better sense of the mechanisms behind these models.

As for evidence, there are numerous studies that have evaluated these models on a wide range of tasks, including text generation, question answering, translation, and more. These studies consistently show that these models perform well on these tasks, often achieving state-of-the-art results. This is empirical evidence that supports the claim that these models have learned meaningful patterns from their training data.

I agree that we should always remain skeptical and open to new evidence and alternative explanations. I welcome any specific criticisms or alternative hypotheses you might have about these models and their capabilities.

Your disagreement seems to be a philosophical one. It is not a technical argument. It seems that you won't accept that semantics can be modelled by an unconscious mechanical system. I am talking about mathematical concepts of semantics, not "true" human semantics that are the product of human insight and consciousness. https://en.wikipedia.org/wiki/Semantic_similarity

While AI doesn't have an innate understanding of the world as humans do, the semantic representations it learns from vast amounts of text data can be surprisingly rich and detailed. It can capture associations and nuances that are not immediately apparent from a purely syntactic analysis of the text.

Oh come on. "Semantic similarity" is just heuristic bullshit. It's not a scientific term, or even a mathematical concept. Don't try to pull rank on me without even knowing who I am or what I do just because you can read wikipedia.

And note you're still not saying "why" or "how", only repeating the "what" of someone else's claim.

I understand your skepticism, and I acknowledge that the concept of semantic similarity is indeed an approximation. However, it is an approximation that has proven highly useful in a wide range of practical applications.

Semantic similarity methods are based on the idea that the meaning of a word can be inferred from its context, which is a concept known as distributional semantics. In essence, words that occur in similar contexts tend to have similar meanings. This is not just a heuristic, it's a well-established principle in linguistics, known as the distributional hypothesis.

In the case of large language models, they are trained on vast amounts of text data and learn to predict the next word in a sentence given the previous words. Through this process, they learn to represent words as high-dimensional vectors (word embeddings) that capture many aspects of their meaning, including their semantic similarity to other words.

These models can generate coherent text, answer questions, translate languages, and perform other language-related tasks with a high degree of accuracy. These capabilities wouldn't be possible if the models were only capturing syntax and not semantics.

The 'why' is because these models learn from the statistical regularities in their training data, which encode both syntactic and semantic information. The 'how' is through the use of deep learning algorithms and architectures like transformers, which allow the models to capture complex patterns and relationships in the data.

I hope this provides a more detailed explanation of my argument. I'm not trying to 'pull rank', but simply explaining the basis for my claims. I understand this is a complex topic, and I appreciate your challenging questions as they help clarify and deepen the discussion.