| Thank you for your feedback! Building apps with LLMs is fundamentally an exercise of manipulating strings and making API calls (Both LLM and vector db APIs). When apps are more complex and start including multiple LLM calls and prompts that can change dynamically some additional challenges start to emerge. e.g figuring out: - When was a particular LLM called? - How much time did it take? - What were the input variables? - What did the prompt look like? - What was the exact configuration of each LLM? - How many times did we retry the request? - What was the raw data the API returned? - What was exactly returned from the vector store and fed into an LLM? - How many tokens were used? - What was the final result for each call? - How do you make API calls in parallel? I think a framework like this should provide abstractions that allow people to focus on the important part like prompt engineering and productionizing the app and worry less about how to figure out stuff like this. Right now LLMFlows supports only OpenAI models and Pinecone but I am working on classes for Chroma and Weaviate. And I would like to also provide support for Bard and Claude once I get access. |
I appreciate the detailed explanation, thank you!
If you'll permit another noob question, can/should frameworks like this include source attribution for responses? Based on articles like https://jamesg.blog/2023/04/02/llm-prompts-source-attributio... I'd guess the strict answer would be no, and that any strategies you might use to elicit them might hallucinate citations.