Hacker News new | ask | show | jobs
by SwellJoe 33 days ago
That's just the plain text (or whatever files), that's not the context the model is directly working with on the server, which is tokenized, embedded, vectorized and has attention run against those vectors. The local history is generally quite small, the context generally quite a bit larger. A text conversation of a few hundred kilobytes in plain text will be gigabytes in context.
1 comments

KV for a sota model is into terrabytes