|
|
|
|
|
by arka2147483647
810 days ago
|
|
Assume we have a child, and we test him regularly: - Test 1: First he can just draw squiggles on the math test - Test 2: Then he can do arithmetic correctly - Test 3: He fails on the last details on the algebraic calculation. Now, event though he fails on all tests, any reasonable parent would see that he improving nicely, and would be able to work in his chosen field in a year or so. Or alternatively, if we talk about AI, we can set the Test as a threshold, and we see the results are continuously trending upwards, and we can expect the curve to breach the threshold in the future. That is; measuring improvement, instead of pass/fail, allows one to predict when we might be able to use the AI for something. |
|
When you actually do these millions of tests, I don't think it really matters what the exact success metric is - an AI which is 'closer to correct, but still wrong' on one test will still get more tests correct overall on the dataset of millions of tests.