| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by postalcoder 678 days ago

To add some context, this isn't that novel of an approach. A common approach to improve RAG results is to "expand" the underlying chunks using an llm, so as to increase the semantic surface area to match against. You can further improve your results by running query expansion using HyDE[1], though it's not always an improvement. I use it as a fallback.

I'm not sure what Anthropic is introducing here. I looked at the cookbook code and it's just showing the process of producing said context, but there's no actual change to their API regarding "contextual retrieval".

The one change is prompt caching, introduced a month back, which allows you to very cheaply add better context to individual chunks by providing the entire (long) document as context. Caching is an awesome feature to expose to developers and I don't want to take anything away from that.

However, other than that, the only thing I see introduced is just a cookbook on how to do a particular rag workflow.

As an aside, Cohere may be my favorite API to work with. (no affiliation) Their RAG API is a delight, and unlike anything else provided by other providers. I highly recommend it.

1: https://arxiv.org/abs/2212.10496

2 comments

resiros 678 days ago

I think the innovation is using caching as so to make the cost of the approach manageable. The way they implemented it is that each time you create a chunk, you ask the llm to create an atomic chunk from the whole context. You need to do this for all tens of thousands of chunks in your data. This costs a lot. By caching the documents, you can spare costs

link

skeptrune 678 days ago

You could also just save the first outputted atomic chunk and store it then re-use it each time yourself. Easier and more consistent.

link

IanCal 677 days ago

I don't understand how that helps here. They're not regenerating each chunk every time, this is about caching the state after running a large doc through a model. You can only do this kind of thing if you have access to the model itself, or it's provided by the API you use.

link

postalcoder 678 days ago

To be fair, that only works if you keep chunk windows static.

link

postalcoder 678 days ago

Yup. Caching is very nice.. but the framing is weird. "Introducing" to me, connotes a product release, not a new tutorial.

link

bayesianbot 677 days ago

I was trying to do this using Prompt Caching like a month ago, but then noticed there's five minute maximum lifetime for the cached prompts - doesn't really work for my RAG needs (or probably most), where the queries would be ran during the next month or a year. I can't see any changes to that policy. Little surprised to see them talk about Prompt Caching relating to RAG.

link

spott 677 days ago

They aren’t using the prompt caching on the query side, only on the embedding side… so you cache the document in the context window when ingesting it, but not during retrieval.

link

KTibow 677 days ago

It seems a little odd to make multiple requests instead of using one request to create all the context for all the chunks.

link