| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by olejorgenb 302 days ago
	> ... re-ordered to take into account LLM memory patterns. If I understand you correctly, doesn't this break prefix KV caching?

1 comments

CuriouslyC 301 days ago

It is done at immediately before the LLM call, transforming the message history for the API call.

This does reduce the context cache hit rate a bit, but I'm cache aware so I try to avoid repacking the early parts if I can help it. The tradeoff is 100% worth it though.

link

psadri 300 days ago

I’m curious about this project (I’m working on something similar). Anyway to get in contact with you?

link

CuriouslyC 300 days ago

you can click my spam protected email links on https://sibylline.dev, those should be working now. Any CTA will get me.

link