| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mvkel 766 days ago

Just because you can make something doesn't mean you know why it's made.

There are thousands of people around the world trying to reverse engineer what is going on in the billions or trillions of parameters in an LLM.

It's a field called "Mechanistic Interpretability." The people who do the work jokingly call it "cursed" because it is so difficult and they have made so little progress so far.

Literally nobody can predict before they are released what capabilities new models will have in them.

And then, months after a model is released, people discover new abilities in it, such as decent chess playing.

They are black boxes.

2 comments

irthomasthomas 766 days ago

I predict that this is largely an illusion staged by the lack of publishing of the datasets and training regime used.

Also an artefact of how evals have been done on a pass fail basis. So that an LLM that gets 90% of a question right is just as much a failure as one that gets 0% of the question.

So that skills appear to emerge suddenly and surprisingly only due to the flawed way that we are forced to study them. Consider the training regime, and partial success towards a goal, and emergence is far less prevalent. There was a paper on that recently, I'll see if I can find.

link

mvkel 766 days ago

Until <5 years ago, AI was almost entirely a purely academic field, theoretical at that.

Those same academics admit themselves that they're surprised at how well LLMs do considering how simple(?) rudimentary(?) the logic underneath is.

I don't quite understand what you're saying. That these academics were being lazy by not properly investigating/publishing their findings? That doesn't seem right.

link

alsetmusic 765 days ago

They may be black boxes but that doesn't change that they are operating on statistics. I see no evidence that "AI" so far is anywhere near to cracking reasoning. It doesn't matter how magical their inner-workings. They have been trained to spit out plausible text and images (and more limited, video).

All of that is imitation, nothing near thought.

link