|
|
|
|
|
by yelmahallawy
95 days ago
|
|
This makes sense. I am particularly interested in your invoice processing app example because the accuracy of those outputs can be quantitatively measured from 0%-100% accuracy. I'm curious as to what is _good enough_ and how many iterations it takes to get there. Is 100% the only acceptable threshold? If so, how many iterations does that take? What does that process look like? Okay let's say 100% accuracy is too difficult to reach, then how do you choose your minimum acceptable threshold (is 95% accuracy good enough? is 90%?). Do you have a dedicated set of outputs and documents used for evals? I'd love to hear more about this example (if you worked directly on the evals for this app). |
|