Hacker News new | ask | show | jobs
by heljakka 204 days ago
What are the main shortcomings of the solutions you tried out?

We believe you need to both automatically create the evaluation policies from OTEL data (data-first) and to bring in rigorous LLM judge automation from the other end (intent-first) for the truly open-ended aspects.