Hacker News new | ask | show | jobs
by JimmyRuska 2402 days ago
If you have historical data to validate against, you can set a leader board on models run against older data, and always leave part of the data out and unavailable for test

https://gluebenchmark.com/leaderboard/

This encourages a simple first version and incremental complexity, rather than starting very complex 6 months in, and never having an easy baseline to compare to. A simple baseline can spawn off several creative methods of improvement to research.

The other case is that the models should be run against simple cases that are easy to understand and easy to confirm. This way there's always a human QA component available to make sure results are sensible.