Hacker News new | ask | show | jobs
by YeGoblynQueenne 754 days ago
Sorry, I missed this.

>> Unless i'm misunderstanding what you mean by hidden variables, it's very clear a transformer is regularly learning not just the sequences themselves but what might produce them.

That's what I mean, but I don't think that's happening regulary, or at all. I don't see where the transformer architecture allows for this. Of course we can claim that any model of a process from examples is implicitly modelling the underlying sub-processes, for example we can claim that a multivariate regression that predicts the age at death from demographic data is somehow learning to represent human behaviour, say, but that's one of those big claims that need big evidence.

On the two works you link to, I know the one on mechanistic interpretabiity. As the author says:

Epistemic status: I feel pretty confident that I have fully reverse engineered this network, and have enough different lines of evidence that I am confident in how it works.

But I don't feel that confident at all that the author's confidence should instill confidence in myself. A clear, direct proof is needed, although of course we can discuss what a proof even means and how much it is a social construct etc.

The other paper, I haven't read. I'm going to bet it's basically data leakage which is a pervasive problem with most deep learning work that suffices to invalidate many big claims about big results. I'll have to read the paper a bit more carefully.

But, again, what is in the transformer architecture that can predict hidden variables?