|
|
|
|
|
by resiros
872 days ago
|
|
Shameless promotion, I might have the tool for that :) https://github.com/agenta-ai/agenta
We're building a platform for evaluating prompts (and more complex LLM workflows). From what we've seen from users, the results for prompts are highly stochastic. It's hard to make generalizations. For example, a user building a sales assistant discovered that by simply changing the order of the sentences in the prompt, the accuracy improved significantly. |
|