Hacker News new | ask | show | jobs
by sumanyusharma 666 days ago
Absolutely agree that creating effective evals requires domain expertise. Right now, we're co-building evals with customers, but we're identifying which aspects can be productized.

Regarding text-based evals — part of testing voice agents involves assessing their core reasoning logic. To do that, we bypass the voice layer and simulate conversations via text. So yes, the core simulation engine is reusable for both conversational text and voice interactions.

We're also excited about shipping the ability to replay a simulated conversation inspired by a real user!