Hacker News new | ask | show | jobs
If you are using LLM RAG – you should be doing RAFT (techcommunity.microsoft.com)
52 points by shishirpatil 825 days ago
9 comments

This is a three-way collaboration between Berkeley AI, Microsoft Azure, and Meta AI! RAFT involves the concept of domain-specific RAG, which represents a more focused and growingly favored area compared to the broader concept of the general open-book exam. In such exams, the domain in which the LLM will be evaluated is known in advance and used for inference. The LLM is capable of addressing prompts by leveraging any and all information from this particular domain, on which it has been specifically fine-tuned.

Blogs: https://gorilla.cs.berkeley.edu/blogs/9_raft.html

It's "Retrieval Augmented Fine Tuning". The related blog post is interesting: https://gorilla.cs.berkeley.edu/blogs/9_raft.html
> Retrieval Aware Fine-Tuning (RAFT), presents a novel recipe to prepare fine-tuning data to tailor the models for domain-specific open-book setting, equivalent to in-domain RAG.

I cannot see the insight on why this is a for a limited domain? The key problem that is being solved is the known problem where RAG returns an irrelevant chunk. It seems like the "benefit" is training a model to ignore irrelevant chunks.

I am guessing because it costs money to train on multi-domains so they limited their research on one-domain at a time but not sure if there is a "bigger reason" why this isn't an approach to a fine-tuned "make answers from only relevant chunks" model? The paper seems to imply this is only works for specific-domains but I can't see why.

From [1]:

> We demonstrate that our RAG approach trains the model to perform better RAG on the set of documents it is trained on i.e., in-domain. By removing the oracle documents in some instances of the training data, we are compelling the model to memorize domain-knowledge.

What if you wanted to train it to say that it didn't find the answer?

[1] https://gorilla.cs.berkeley.edu/blogs/9_raft.html

Does any of this matter when you have 5 million token input and can just shove everything into the input?
How to learn quadratic scaling the hard way.
Yes, because 5 million token input is slow, expensive, and error-prone.
> They hypothesized that a student who studies the textbooks before the open-book exam was likely to perform better than a student who studies the textbook.

interesting hypothesis, but probably not what was meant.

“editor” is shaping up to be the hot new job for fleshy consciousnesses by the early thirties.

Page me when this process is at least partially automated and continuous.
Call me when there’s a nix flake for it!
RAFT is amazing!!
Am I missing the point or is this not how everyone was doing RAG before?

I've had my much greater success doing the RAG process on a fine tuned model.

> Am I missing the point or is this not how everyone was doing RAG before?

No, especially people wrapping RAG frameworks around models they don’t have access to fine tune (e.g., GPT-4 if you aren't OpenAI and/or Microsoft.)

Edit: Of course, further evidence that this method is useful doesn't help people in that condition, though.