Hacker News new | ask | show | jobs
by tslmy 802 days ago
To make a LLM relevant to you, your intuition might be to fine-tune it with your data, but:

1. Training a LLM is expensive.

2. Due to the cost to train, it’s hard to update a LLM with latest information.

3. Observability is lacking. When you ask a LLM a question, it’s not obvious how the LLM arrived at its answer.

There’s a different approach: Retrieval-Augmented Generation (RAG). Instead of asking LLM to generate an answer immediately, frameworks like LlamaIndex:

1. retrieves information from your data sources first,

2. adds it to your question as context, and

3. asks the LLM to answer based on the enriched prompt.

RAG overcomes all three weaknesses of the fine-tuning approach:

1. There’s no training involved, so it’s cheap.

2. Data is fetched only when you ask for them, so it’s always up to date.

3. The framework can show you the retrieved documents, so it’s more trustworthy.

(https://lmy.medium.com/why-rag-is-big-aa60282693dc)

1 comments

This is the state of LLMs today - it is likely that we will have models in the future that can do some form of "online" training - or new training methods that aren't nearly as compute intensive. There are many people working on these scaling issues with LLMs today. We already have new attention heads that work around the quadratic time and space complexity of the input prompts.