| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bluegatty 31 days ago
	Chat history is kept locally, generally you have to send the 'whole history' to the model 'each turn'.

2 comments

SwellJoe 31 days ago

That's just the plain text (or whatever files), that's not the context the model is directly working with on the server, which is tokenized, embedded, vectorized and has attention run against those vectors. The local history is generally quite small, the context generally quite a bit larger. A text conversation of a few hundred kilobytes in plain text will be gigabytes in context.

link

bluegatty 31 days ago

KV for a sota model is into terrabytes

link

rixed 31 days ago

Only "generally"? I'm curious what API has moved away from this protocol that seems mode adapted to conversaions with humans than agentic loops.

link

_flux 31 days ago

To me it would certainly make sense if the protocol just said "append this text to context window id/sha256", in particular as the data is cached in tensor level in the provider side, so they need to first do that lookup anyway. So I would be surprised if they don't have that.

In addition, this protocol could make it more transparent to say "oh we cannot proceed as we dropped the this cache, are you sure you want to proceed and consume a whole lot of expensive uncached tokens?". Oh, maybe that's a reason not to do it..

link

bluegatty 31 days ago

So the standard API you pass it all along but I think there are some odd open ai apis that are different.

link