| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by manidoraisamy 409 days ago
	Ideally, you want to start small and iterate. With Promptrepo, you can use versioning to compare model outputs across different datasets. In the test UI, we calculate confidence scores using @promptrepo/score [1], which parses OpenAI’s logprobs and shows field-level reliability. Fields with low confidence are highlighted in red, making it easy to catch signs of overfitting or data drift. [1] https://github.com/ManiDoraisamy/promptrepo-score