Hacker News new | ask | show | jobs
by lawlessone 987 days ago
So is it analogous to how a CNN starts with fragments of images and further up the chain assembles these into objects?
1 comments

Yes, I think that is a reasonable way to think about it, in my opinion. However, with the language modeling objective it predicts the next token and because of the residual connections each intermediate layer is in the same space. So, maybe it would be more accurate to say that it is an increasingly accurate representation of the next token.