Hacker News new | ask | show | jobs
by whiteandnerdy 1158 days ago
I think it does come down to semantics. When you say "weights", people will take you to mean the pre-trained parameters of the network.

I agree that in some sense the attention weights are more like meta-weights that are applied to the context of the conversation to decide how to actually weight the various words. So it's totally correct to say that previous words in the conversation affect how future words will be weighted, and I think it's reasonable to call that 'learning': for example, you can tell ChatGPT new words and it will be able to use them in context. Again though, people usually take 'learning' to mean making updates to the trained parameters of the model itself, which obviously isn't happening here.