Hacker News new | ask | show | jobs
by staranjeet 904 days ago
This post is about learnings by running a RAG application in production.

Here are the learnings:

• Always customise your prompt. • Set soft & hard limit on your LLM cost before launching any project. • Choose the LLM model wisely. • Context length matters a lot. • Cache your queries. • Have a router to choose LLM model wisely. • Have a UI to see all queries, answers, context & metrics like response time. • Memory management in chat is painful.