Hacker News new | ask | show | jobs
by mousetraps 2934 days ago
> stigma coming from the academic side that dataset collection is a low-level problem not worthy of serious algorithmic investment

Agreed it needs more attention, but - for academia - I think it's more of an incentive issue than a stigma issue. E.g. harder to benchmark the performance of two algorithms if they don't operate on the same dataset. Also to be fair, research into things like synthetic data mitigates the problem, just in a different way.

The paper you cited is interesting. Thanks for sharing. Hopefully that spawns more focus into understanding the subtleties of each dataset. IIRC Kaggle also had issues around generalizability, but for different reasons.

Anyways it's still early on... but we're currently building tools to help solve this problem. In particular simplifying the data collection / labeling process for vision systems. Would love to chat further w/ anyone interested in providing feedback. Email is sara@viewpointrobotics.com

1 comments

It's indeed "an incentive issue" only not the one you've mentioned, but the one OP hinted at. Research is focused on what's publishable, hence tenure-trackable, and not on what's useful to solve real-world problems (of course, the two occasionally coincide)
Why so black and white? There are many incentives at play, and many ways to contribute to solving real world problems.

I’ve spent time in both academic research and industry.

Research is not supposed to be immediately applicable. The goal is to produce new knowledge - more importantly shared knowledge. Publishing is not a bad measure of that. Additionally, ability to secure grants provides incentive to focus on problems others want solved.

No incentive system is perfect, but I don’t really see how this is any different from any organization. And I don’t think it’s fair to judge an entire discipline by the negative examples.

I didn’t judge anything nor said something “negative”
Okay fair enough, maybe we’re just talking past each other :).