Yes I'm not sure I really want to vibe code something that does auto evals on a sample of my OTEL traces any more than I want to build my own analytics library.
I've used and liked Promptfoo a lot, but I've run into issues when trying to do evaluations with too many independent variables. Works great for `models * prompts * variables`, but broke down when we wanted `models * prompts * variables^x`.