Hacker News new | ask | show | jobs
by erlend_sh 342 days ago
Exactly. If we really wanted to benchmark the various models on the merits of their individual implementations, we should be comparing them all on the same open dataset.