|
|
|
|
|
by noaflaherty
1205 days ago
|
|
Thanks! We totally agree that spot-checking won't scale long term. We're currently testing a feature in beta that allows you to provide an "expected output" and then choose from a variety of comparison metrics (e.g. exact match, semantic similarity, Levenshtein distance, etc.) to derive a quantitative measure of output quality. The jury's still out whether this is sufficient, but we're excited to continue pushing in this direction. p.s. it's cool to hear from another company that's helping expand this market! |
|