Hacker News new | ask | show | jobs
by arka2147483647 810 days ago
Assume we have a child, and we test him regularly:

- Test 1: First he can just draw squiggles on the math test

- Test 2: Then he can do arithmetic correctly

- Test 3: He fails on the last details on the algebraic calculation.

Now, event though he fails on all tests, any reasonable parent would see that he improving nicely, and would be able to work in his chosen field in a year or so.

Or alternatively, if we talk about AI, we can set the Test as a threshold, and we see the results are continuously trending upwards, and we can expect the curve to breach the threshold in the future.

That is; measuring improvement, instead of pass/fail, allows one to predict when we might be able to use the AI for something.

1 comments

With AI you can do millions of tests. Some tests are easy by chance (eg. "Please multiply this list of numbers by zero"). Some tests are correct by chance alone, easy or hard.

When you actually do these millions of tests, I don't think it really matters what the exact success metric is - an AI which is 'closer to correct, but still wrong' on one test will still get more tests correct overall on the dataset of millions of tests.