Hacker News new | ask | show | jobs
by optimalsolver 479 days ago
You need benchmarks with the following three properties:

1) No known solutions, so there's no "ground truth" dataset to train on

2) Presumably hard to solve

3) But easy to verify a solution if one is provided.

This, of course, is easier done on the STEM side of things, but how do you automatically test creativity, or philosophical aptitude?

1 comments

I guess it's purely subjective. Maybe some internal commission if it comes to quality of creative work?