This post is about learnings by running a RAG application in production.
Here are the learnings:
• Always customise your prompt.
• Set soft & hard limit on your LLM cost before launching any project.
• Choose the LLM model wisely.
• Context length matters a lot.
• Cache your queries.
• Have a router to choose LLM model wisely.
• Have a UI to see all queries, answers, context & metrics like response time.
• Memory management in chat is painful.
Here are the learnings:
• Always customise your prompt. • Set soft & hard limit on your LLM cost before launching any project. • Choose the LLM model wisely. • Context length matters a lot. • Cache your queries. • Have a router to choose LLM model wisely. • Have a UI to see all queries, answers, context & metrics like response time. • Memory management in chat is painful.