Hacker News new | ask | show | jobs
by RustySpottedCat 644 days ago
Can someone explain exactly what is the "unknown" of neural networks? We built them, we know what they comprise of and how they work. Yes, we can't map out every single connection between nodes in this "multilayer perceptron" but don't we know how these connections are formed?
8 comments

Sota LLMs like GPT-4o can natively understand b64 encoded text. Now we have algorithms that can decode and encode b64 text. Is that what GPT-4o is doing ? Did training learn that algorithm ? Clearly not or at least not completely because typos in b64 that would destroy any chance of extracting meaning in the original text for our algorithms are barely an inconvenience for 4o.

So how is it decoding b64 then ? We have no idea.

We don't built Neural Networks. Not really. We build architectures and then train them. Whatever they learn is outside the scope of human action beyond supplying the training data.

What they learn is largely unknown beyond trivial toy examples.

We know connections form, we can see the weights, we can even see the matrices multiplying. We don't know what any of those calculations are doing. We don't know what they mean.

Would an alien understand C Code just because he could see it executing ?

Our DNA didn't build our brain. Not really. Our DNA coded for a loose trainable architecture with a lot of features that result from emergent design, constraints of congenital development, et cetera. Even if you include our full exome, a bunch of environmental factors in your simulation, and are examining a human with obscenely detailed tools at autopsy, you're never going to be able to tell me with any authenticity whether a given subject possesses the skill 'skateboarding'.
I find this analogy kind of confusing? Wouldn’t the analogous thing be to say that our DNA doesn’t understand, uh, how we are able to skateboard? But like, we generally don’t regard DNA as understanding anything, so that not unexpected.

Where does “we can’t tell whether a person possesses the skill of ‘skateboarding’?” fit in with, DNA not encoding anything specific to skateboarding? It isn’t as if we designed our genome and therefore if our genome did hard-code skateboarding skill that we would therefore (as designers of our genome) have full understanding of how skateboarding skill works at the neuron level.

I recognize that a metaphor/analogy/whatever does not have to extend to all parts of something, and indeed most metaphors/analogies/whatever fail at some point if pushed too far. But, I don’t understand how the commonalities you are pointing to between [NN architecture : full NN network with the specific weights] and [human genome : the whole behavior of a person’s brain including all the facts, behaviors, etc. that they’ve learned throughout their life] is supposed to apply to the example of _knowing_that_ a person knows how to skateboard?

It is quite possible that I’m being dense.

Could you please elaborate on the analogy / the point you are making with the analogy?

The brain is just an example of a system we are all running that we understand the baseline mechanics of, but which for any task much more complex than breathing, is accomplished through a novel self-organizing structure using a lot of iteration. Other than very broad-strokes regional distinctions, the brain is not organized by some plan that existed before construction, and is not comprised of intelligible dedicated circuit that we can observe postmortem with perfect information.

The sheer number and variety and networking of synapses involved in the skill 'skateboarding' is irreducibly, unintelligibly complex for an intelligence on the scale of a conscious human mind to describe, fully comprehend, or even recognize with a great deal of analysis. Even if you decided all the functional pathworks through the network in one example, you would not be able to decode another because every skateboarder has trained their neural network in a unique manner.

> the brain is not organized by some plan that existed before construction, and is not comprised of intelligible dedicated circuit that we can observe postmortem with perfect information.

Well said. You've reminded me of a beautiful sci-fi short story almost about this exact "mystery"

https://www.lightspeedmagazine.com/fiction/exhalation/

Base64 encoding is very simple - it's just taking each 6-bits of the input and encoding (replacing) it as one of the 64 (2^6) characters A-Za-z0-9+/. If the input is 8-bit ASCII text, then each 3 input characters will be encoded as 4 Base64 characters (3 * 8 = 24 bits = 4 * 6-bit Base64 chunks).

So, this is very similar to an LLM having to deal with tokenized input, but instead of sequences of tokens representing words you've got sequences of Base64 characters representing words.

It's not about how simple B64 is or isn't. In fact i chose a simple problem we've already solved algorithmically on purpose. It's that all you've just said, reasonable as it may sound is entirely speculation.

Maybe "no idea" was a bit much for this example but any idea certainly didn't come from seeing the matrices themselves fly.

That's not entirely true in the case of base64 because of how statistical patterns within natural languages work. For example, you can use frequency analysis to decrypt a monoalphabetic substitution cipher on pretty much any language if you have a frequency table for character n-grams of the language, even with small numbers for n. This is a much more shallow statistical processing than what's going on within an LLM so I don't think many were surprised that a transformer stack and attention heads could decode base64. Especially if there were also examples of base64-encoding in the training data (even without parallel corpora for their encodings).

It doesn't explain higher level generalizations like being a transpiler between different programming languages that didn't have any side-by-side examples in the training data. Or giving an answer in the voice of some celebrity. Or being able to find entire rhyming word sequences across languages. These are probably more like the kind of unexplainable generalizations that you were referring to.

I think it may be better to frame it in terms of accuracy vs precision. Many people can explain accurately what an LLM is doing under all those matrix multiplies, both during training and inference. But, precisely why an input leads to the resulting output is not explainable. Being able to do that would involve "seeing" the shape of the hypersurface of the entire language model, which as sibling commenters have mentioned is quite difficult even when aided by probing tools.

Huh? I just pointed out what Base64 encoding actually is - not some complex algorithm, but effectively just a tokenization scheme.

This isn't speculation - I've implemented Base64 decode/encode myself, and you can google for the definition if you don't believe I've accurately described it!

The speculation here is not about what b64 text is. It's about how the LLM has learnt to process it.

Edit: Basically, For all anyone knows, it treats b64 as another language entirely and decoding it is akin in the network to translating French rather than the very simple swapping you've just described.

LLMs, just like all modern neural nets, are trained via gradient descent which means following the most direct path (steepest gradient on the error surface) to reduce the error, with no more changes to weights once the error gradient is zero.

Complexity builds upon simplicity, and the LLM will begin by noticing the direct (and repeated without variation) predictive relationship between Base64 encoded text and corresponding plain text in the training set. Having learnt this simple way to predict Base64 decoding/encoding, there is simply no mechanism whereby it could change to a more complex "like translating French" way of doing it. Once the training process has discovered that Base64 text decoding can be PERFECTLY predicted by a simple mapping, then the training error will be zero and no more changes (unnecessary complexification) will take place.

We don't know what each connection means, what information is encoded in each weight. We don't know how it would behave differently if each of the million or trillion weights was changed.

Compare this to dictionaey, where it's obvious what information is on each page and each line.

Skipping some detail: the model applies many high-dimensional functions to the input, and we don't know the reasoning for why these functions solve the problem. Reducing the dimension of the weights to human-readable values is non-trivial, and multiple neurons interact in unpredictable ways.

Interpretability research has resulted in many useful results and pretty visualizations[1][2], and there are many efforts to understand Transformers[3][4] but we're far from being able to completely explain the large models currently in use.

[1] - https://distill.pub/2018/building-blocks/

[2] - https://distill.pub/2019/activation-atlas/

[3] - https://transformer-circuits.pub/

[4] - https://arxiv.org/pdf/2407.02646

The brain serves as a useful analogy, even though LLMs are not brains. Just as we can’t fully understand how we think by merely examining all of our neurons, understanding LLMs requires more than analyzing their individual components, though decoding LLMs is most likely easier, which doesn't mean easy.
We know how they are formed(and how to form them), we don't know why forming in that particular way solves the problem at hand.

Even this characterization is not strictly valid anymore, there is a great deal of research into what's going on inside the black box. The problem was never that it was a black box(we can look inside at any time), but that it was hard to understand. KANs help some of that be placed into mathematical formulation. Generating mappings of activations over data similarly grants insight.

* Given the training data, and the architecture of the network, why does SGD with backprop find the given f? vs. any other of an infinite set.

* Why are there are a set of f each with 0-loss that work?

* Given the weight space, and an f within it, why/when is a task/skill defined as a subset of that space covered by f?

I think a major reasons why these are hard to answer is that it's assumed that NNs are operating within an inferential statistical context (ie., reversing some latent structure in the data). But they're really bad at that. In my view, they are just representation-builders that find proxy representations in a proxy "task" space (def, aprox, proxy = "shadow of some real structure, as captured in an unrelated space").

We know the process to train a model, but when a model makes a prediction we don't know exactly "how" it predicts the way it does.

We can use the economy as an analogy. No single person really understands the whole supply chain. But we know that each person in the supply chain is trying to maximize their own profit, and that ultimately delivers goods and services to a consumer.

There’s a ton of research going into analysing and reverse engineering NNs, this “they’re mysterious black boxes and forever inscrutable” narrative is outdated.