Hacker News new | ask | show | jobs
by miltondts 1291 days ago
What I don't understand is where is the memory? How does GPT-3 or ChatGPT remember so much information with just that architecture? It would seem that the maximum it could remember is 2048 words.

EDIT: Maybe it's 2048 x 96? Still seems low for what it can do.

1 comments

300bn weights, at 4bytes/weight is 1.2TB
Yes, but how does it remember the stuff you told it earlier in the conversation? Those 1.2TB is the trained model, and I assume that those weights are not changed by the conversation?
I believe that the previous input, from earlier in the conversation, is always prepended to the new input.