| Hey everyone! I've been building AI agents lately and kept running into the same problem: how do you test AI Agents? I find that manually prompting the Agent for each release is tedious and not scalable. Also, existing solutions for testing agents are often complex to integrate. To help with this I built a simple open-source testing framework that uses AI to validate AI: you define expected behavior and let an LLM judge if the output is semantically correct. The LLMJudge returns a score (0-1) and reasoning for why it passed/failed. You can try it live here (no signups): https://semantictest.dev The playground runs real LLMJudge validation so you can see how the semantic testing works. The code is completely open source and you can find extensive documentation here: https://docs.semantictest.dev Would love feedback from you guys! Thank you! |