| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bisonbear 77 days ago
	Very cool, interested to read more once you post! FWIW I've been building eval infras that does something adjacent/related — replaying real repo work against different agent configs, and measuring the agent's quality dimensions (pass/fail, but also human intent alignment, code review, etc.). If you want to compare notes on the harness design, or if having an independent eval of lat vs. no-lat on quickjs would be useful, happy to chat :)