Hacker News new | ask | show | jobs
by mjburgess 1291 days ago
300bn weights, at 4bytes/weight is 1.2TB
1 comments

Yes, but how does it remember the stuff you told it earlier in the conversation? Those 1.2TB is the trained model, and I assume that those weights are not changed by the conversation?
I believe that the previous input, from earlier in the conversation, is always prepended to the new input.