| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ericlevine 103 days ago
	Totally fair feedback, and it’s true, many of these are synthetic evals with a few that were still synthetically produced but guided. At this point, because it’s all self-hosted, I only have my own data set. The places where it fails (for me) today are due to feature gaps rather than LLM mistakes. This is a new project that has not been widely announced, so my user base today is small but growing. If you give it a whirl and find it making mistakes, please send them my way! :)