| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by shawntan 288 days ago

This would not help if no proper constraints are established on what data can and cannot be trained on. And maybe just figuring out what the goal of the benchmark is.

If it is to test generalisation capability, then what data the model being evaluated is trained on is crucial to making any conclusions.

Look at the construction of this synthetic dataset for example: https://arxiv.org/pdf/1711.00350