| Hi HN, I built Simboba because we needed something for our own product, and I wanted to explore how dev tools change with AI coding assistants. The hardest part of evals is creating good annotated test cases. So I focused on getting users there with the power of AI coding assistants. How it works: - AI drafts datasets from your codebase
- Web UI to review and refine cases
- Everything tracked in git
- Multi-turn conversation support
- LLM-as-judge + deterministic checks (e.g. was tool xyz called)
- Evals run as simple Python scripts The AI coding assistant angle: I wanted users running customised evals in 5 minutes, not just a generic example. Started with a CLI command that generates a prompt for Claude Code / Cursor. That felt like too much friction. Then tried a separate markdown file with instructions, but AI tools didn't find this easily. What worked: clear instructions in the README plus structured schemas (Pydantic). The AI reads the README, understands your codebase, the structure required for the package and drafts a real dataset for your product. Then writes the eval script. I'd love to know what you think, and happy to answer questions! |