| I’m consulting multiple teams on shipping LLM-driven business automation. So far I have seen only one case where fine-tuning a model really paid off (and didn’t just blow up the RLHF calibration and caused wild hallucinations). I would suggest to avoid training and look into RAG systems, prompt engineering and using OpenAI API for a start. You can do a small PoC quickly using something like LangChain or LlamaIndex. Their pipelines can ingest unstructured data in all file formats, which is good for getting a quick feel. Afterwards, if you encounter hallucinations in your tasks - throw out vector DB and embeddings into the trashcan (they are pulling junk information into the context and causing hallucinations). Replace embeddings with a RAG based on full text search and query expansion based on the nuances of your business. If there are any specific types of questions or requests that you need special handling for - add a lightweight router (request classifier) that will direct user request to a dedicated prompt with dedicated data. By that time you would’ve probably lost all of RAG, replacing it with a couple of prompt templates, a file based knowledge base in markdown and CSV and a few helpers to pull relevant information into the context. That’s how most of working LLM-driven workflows end up (in my bubble). Maybe just with PostgreSQL and ES instead of file-based knowledge base. But that’s an implementation detail. Update: if you really want to try fine-tuning your own LLM - this article links to a Google Collab Notebook for the latest Llama 3.1 8B: https://unsloth.ai/blog/llama3-1 It will not learn new things from your data, though. Might just pick up the style. |
Not sure why this would be true. In my experience, semantic search based on a vector index/embeddings pulls in more relevant information than a full-text keyword search. Maybe there is too broad a set of materials in your vector db, or the chunking strategy isn't good?