| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by heljakka 204 days ago
	What are the main shortcomings of the solutions you tried out? We believe you need to both automatically create the evaluation policies from OTEL data (data-first) and to bring in rigorous LLM judge automation from the other end (intent-first) for the truly open-ended aspects.