| HN Mirror

>> If a non-standard dataset is being used, I would expect there to be a discussion/analysis on what characteristics of that dataset made it unusable for this paper.

Unfortunately that just adds more work for the reviewer, which is a motive for many reviewers to scrap the paper so they don't have to do the extra work.

That sounds mean, so I will quote (yet again) Geoff Hinton on things that "make the brain hurt":

GH: One big challenge the community faces is that if you want to get a paper published in machine learning now it's got to have a table in it, with all these different data sets across the top, and all these different methods along the side, and your method has to look like the best one. If it doesn’t look like that, it’s hard to get published. I don't think that's encouraging people to think about radically new ideas.

Now if you send in a paper that has a radically new idea, there's no chance in hell it will get accepted, because it's going to get some junior reviewer who doesn't understand it. Or it’s going to get a senior reviewer who's trying to review too many papers and doesn't understand it first time round and assumes it must be nonsense. Anything that makes the brain hurt is not going to get accepted. And I think that's really bad.

https://www.wired.com/story/googles-ai-guru-computers-think-...

Basically a new dataset is like a new idea: it makes the brain hurt, for the overburdened experienced researcher or inexperienced younger researcher alike. Testing a new approach on a new dataset? That makes brain go boom.

Which is a funny state of affairs. Not so long ago it used to be that one sure-fire way to make a significant contribution that would give your paper a leg up over the competition was to create a new dataset. I was advised as much at the start of my PhD (four ish years ago). Seems like this has already changed.