Hacker News new | ask | show | jobs
by archgoon 857 days ago
I don't think you've grappled with the point the author is making.

>“If we open up ChatGPT or a system like it and look inside, you just see millions of numbers flipping around a few hundred times a second,” says AI scientist Sam Bowman. “And we just have no idea what any of it means.”

>To me as an engineer, that is just incredibly unsatisfying. Without understanding how something works, we are doomed to be just users.

AI aren't complicated. They aren't sophisticated math that you can poke at and understand.

They're fucking million dollar spaghetti code that happen to work (for values of 'work').

Those videos are teaching people "This is an if statement! This is a CPU!" And then you can look at 5.8 billion lines of spaghetti code and say "Gee! I understand how this works now! Yay!"

2 comments

Asking because I literally do not know: Can you step through AI like you can step through C++ code in a debugger? Like, if you type in a prompt "Draw me a picture of a cat wearing a blue hat" could you (if you wanted to) step through every piece of the AI's process of generating that picture like you are stepping through code? If I wanted to understand how a Diffie–Hellman key exchange function worked, I could step through everything line by line to understand it, it would be deterministic, and I could do the exact same thing again and see the exact same steps.
You probably could but what would you see? A bunch of weights, connections between layers and more numbers.

You don't see any meaningful, understandable code. For example, if prompt begins with draw me a picture then jump to layer X.

I'm no expert but I can imagine that to be the problem when one attempts to debug an Algorithmic Intelligence black box.

And then you can look at 5.8 billion lines of spaghetti code

LLMs don't have anywhere near that much code. The algorithms for training and inference are not that complicated; the "intelligent" behavior is entirely due to the weights.

OP clearly means that the weights are spaghetti code, technically they may be data but if they encode all of the actual functionality of the system then they are effectively bytecode which is interpreted by a runtime. You can understand how the runtime works if you care to learn, but you will never understand what's happening below that, nor will anyone else.

Aside from annoying people who want to understand how things work, it also means you can't ever know if you have a fully optimal or correct solution, all you can do is keep throwing money into the training furnace and hope a better solution falls out next time. The whole nature of it gatekeeps out anyone who doesn't have enormous amounts of money to burn.

I can see that, although to me there's a difference between weights and something like bytecode. The weights don't encode any sort of logical operations, they're just numbers that get multiplied and added according to relatively simple algorithms.

Totally agreed that the process of generating and evaluating weights is opaque and not very accessible.

You can simulate any digital circuit by multiplying and adding numbers.
But that's exactly the point. The code you are talking about is more like an interpreter for a virtual machine, which then runs a program made up of billions of numbers that wasn't designed by a human (or any sort of intelligence - you can argue about the end product, but the training process certainly isn't intelligent)
The weights are what's analogous to 5.8 billion lines of spaghetti code, here, when doing inference.