| We are exploring many use cases for implement GenAI solutions and are building most of them inhouse so we are writing a lot of prompts across various teams in product and engineering. I was trying to explore some best tools for managing and testing prompts for different use cases things i am looking for : Must have : UI where PM's can go and test prompts - here they should be able to test same prompt on different model and a high level overview of cost incurred across these model for the result. SDK/api to fetch these prompts in code with versing and all for different use-cases. Dynamic rules for A/B testing of prompts. Good to have :
Maybe if the tool helps in crafting the prompts, create nested prompts workflows (chain of prompts) , etc. Basically looking for Launchdarkly type solution for prompts where you can also create dynamic rules to load different prompt feature flag them based on user persona and teams. Also interested in hearing how teams are managing or doing this is there a better way or something that I am missing? |
We have quite a few developer teams using us for many of these use cases today. Couple of examples are concepts like controlling model functionality via JSON configs, pushing config variables in via string flags, prompt configurations, all that kind of stuff. Beyond that, teams diving into using targeting to add a layer of control over who's receiving those configurations (the beta opt in example comes up a TON).
Gap today is the "help in crafting the prompt" side of things. We're having some really cool conversations around ideas for this internally and how to make that better. Its totally doable today - the UX could just be better.
Another one that you didn't mention that comes up A LOT is measurement of the changes, and how effective they are (this is coming up A LOT with financial institutions). I.e. "If I roll out this new model, is it performing better? Are we getting better results from the new prompts we built?"
Shameless plug - I did a video on this with Amazon for Bedrock, but we did another one using LD for multiple different model providers too. Here's the link - https://www.youtube.com/watch?v=dTyxRnuI3FQ
If you have any questions, feel free to drop me a line. Happy to jump on and show you live a bit of it and answer question. It's such a great time to be building software. Can also DM me on Twitter (X, whatever) @codydearkand