|
|
|
|
|
by manidoraisamy
409 days ago
|
|
Ideally, you want to start small and iterate. With Promptrepo, you can use versioning to compare model outputs across different datasets. In the test UI, we calculate confidence scores using @promptrepo/score [1], which parses OpenAI’s logprobs and shows field-level reliability. Fields with low confidence are highlighted in red, making it easy to catch signs of overfitting or data drift. [1] https://github.com/ManiDoraisamy/promptrepo-score |
|