Free evals API for AI startups (ship 10x faster with evals you can trust)

Y	Hacker News new \| ask \| show \| jobs

2 points by sfox100 328 days ago

Hey HN,

We built Composo because AI apps fail unpredictably and teams have no idea if their changes helped.

LLM-as-judge doesn't work - it gives random scores, doesn't work well for agents, and doesn't tell you what to fix.

We've built purpose-built evaluation models that give you: - Deterministic scores (same input = same score, always) - Instant identification of where prompts, retrievals, agents & tool calls fail - Exact failure analysis ("tool calls are looping due to poorly specified schema")

We're 92% accurate vs 72% for SOTA LLM-as-judge.

Giving 10 startups free access: - 10k eval credits - Just launched our evals API for agents & tool calling - 5 min setup

Already helping teams at Palantir, Accenture, and Tesla ship reliable AI.

Apply: composo.short.gy/startups

Happy to answer questions about evaluation, reward models, or why LLMs are bad at judging themselves. startups@composo.ai