Hacker News new | ask | show | jobs
by yelmahallawy 95 days ago
This makes sense. I am particularly interested in your invoice processing app example because the accuracy of those outputs can be quantitatively measured from 0%-100% accuracy.

I'm curious as to what is _good enough_ and how many iterations it takes to get there. Is 100% the only acceptable threshold? If so, how many iterations does that take? What does that process look like? Okay let's say 100% accuracy is too difficult to reach, then how do you choose your minimum acceptable threshold (is 95% accuracy good enough? is 90%?). Do you have a dedicated set of outputs and documents used for evals? I'd love to hear more about this example (if you worked directly on the evals for this app).