Hacker News new | ask | show | jobs
by resiros 872 days ago
Shameless promotion, I might have the tool for that :) https://github.com/agenta-ai/agenta We're building a platform for evaluating prompts (and more complex LLM workflows).

From what we've seen from users, the results for prompts are highly stochastic. It's hard to make generalizations. For example, a user building a sales assistant discovered that by simply changing the order of the sentences in the prompt, the accuracy improved significantly.