Hacker News new | ask | show | jobs
LaunchDarkly for Prompt Management
3 points by yashsharma08 825 days ago
We are exploring many use cases for implement GenAI solutions and are building most of them inhouse so we are writing a lot of prompts across various teams in product and engineering.

I was trying to explore some best tools for managing and testing prompts for different use cases things i am looking for :

Must have :

UI where PM's can go and test prompts - here they should be able to test same prompt on different model and a high level overview of cost incurred across these model for the result.

SDK/api to fetch these prompts in code with versing and all for different use-cases.

Dynamic rules for A/B testing of prompts.

Good to have : Maybe if the tool helps in crafting the prompts, create nested prompts workflows (chain of prompts) , etc.

Basically looking for Launchdarkly type solution for prompts where you can also create dynamic rules to load different prompt feature flag them based on user persona and teams.

Also interested in hearing how teams are managing or doing this is there a better way or something that I am missing?

2 comments

Hi! Full disclosure - I work for LaunchDarkly. I looked after our Developer Experience team previously and now I look after our Product Incubation function, working on some of our new initiatives, AI being one of them.

We have quite a few developer teams using us for many of these use cases today. Couple of examples are concepts like controlling model functionality via JSON configs, pushing config variables in via string flags, prompt configurations, all that kind of stuff. Beyond that, teams diving into using targeting to add a layer of control over who's receiving those configurations (the beta opt in example comes up a TON).

Gap today is the "help in crafting the prompt" side of things. We're having some really cool conversations around ideas for this internally and how to make that better. Its totally doable today - the UX could just be better.

Another one that you didn't mention that comes up A LOT is measurement of the changes, and how effective they are (this is coming up A LOT with financial institutions). I.e. "If I roll out this new model, is it performing better? Are we getting better results from the new prompts we built?"

Shameless plug - I did a video on this with Amazon for Bedrock, but we did another one using LD for multiple different model providers too. Here's the link - https://www.youtube.com/watch?v=dTyxRnuI3FQ

If you have any questions, feel free to drop me a line. Happy to jump on and show you live a bit of it and answer question. It's such a great time to be building software. Can also DM me on Twitter (X, whatever) @codydearkand

Hey - Brock here from Statsig - we started out a couple years ago trying to make the best A/B testing tooling for developers, which as it turns out, a lot of the biggest AI companies have found valuable (OpenAI, Anthropic, etc.). Not always directly with prompts, but with model variants, new feature surfaces, etc.

With a combination of our feature flag and experiment tools I think you could accomplish everything you're looking for - with the exception of helping craft the prompts (we tend to leave that to other tools). We have a very generous free tier - if you'd like to try it out happy to provide any help I can!