Hacker News new | ask | show | jobs
by blazespin 2339 days ago
GPT2 does learn, right.

I wonder how much of our knowledge of math is self-attention and how much is something else.

For example, much of what I do when I do calculus is mostly self attention. When I solve a calculus problem, I generally don't think through the squeeze theorem, but apply cookbook math.

My current model for the brain is consciously driven self attention. Ie, 80-90% of what we do is just self attention and our conscious brain checks to see how right/interesting it is around 10-20% of the time.

The key therefore really is training your brain on the right data.

This model I find explains quite a lot of things about people and the way they behave / succeed.

2 comments

> GPT2 does learn, right.

I meant the usual restriction of current DL models where training and inference are separate. Humans update constantly. Think of code review, you have a model in your head what the code you have written does, a reviewer spots some mistake, your model was incorrect, you adjust and while you're at it fix the same kind of mistake in several other places too. GPT2 would be none the wiser. At best the human could prompt it for its top list instead of the most likely completion and see if it comes up with something more useful, but again, it wouldn't update its weights.

And a human can also figure out by how much we need to update, a low probability event means not much adjustment is needed, a serious error on the other hand needs bigger adjustments.

> My current model for the brain is consciously driven self attention. Ie, 80-90% of what we do is just self attention and our conscious brain checks to see how right/interesting it is around 10-20% of the time.

Well, sure, the brain has lots of low-level automation. But the devil is in those "consciously driven" details.

The things that GPT2 doesn't have is some kind of iterative cognitive model, where text is continually modified and re-examined. It also doesn't have any integration with memory, both long term or short term.
That doesn't seem particularly hard to add.

I agree the conscious AGI stuff is the tricky part. But, then maybe it's not. Maybe it's not as clever as we think it is, and if you have a good enough self-attention model the AGI just needs to be symbolic logic.

I'm thinking something that'd pass a turing test, btw. Not something that's hyper smart.