Hacker News new | ask | show | jobs
by 99052882514569 2378 days ago
>Yes, a dataset like that would be very interesting and I would happily play around with it for a few weeks and see what I can come up with.

Spare-time tinkerers aren't really the intended audience here:

>>The neuro data set will allow researchers to test their models with data from additional machine types, new sequence types, and different coil configurations that were not present in the previously released fastMRI knee data set. Radiologists also look for different diagnostic properties (such as contrast in texture between different neural tissue) in brain MRIs. These differences present an interesting and challenging machine learning problem to solve and will help researchers develop models that generalize to more clinical settings.

It's unlikely in the extreme that anyone in that audience will be stopped by a simple data sharing agreement. And it's also unlikely in the extreme that anyone outside that audience will know what to do with a bunch of raw k-space MR datasets. Domain knowledge is an absolute necessity with this data.

1 comments

Domain knowledge, while useful is not a necessity - I'm fine with contacting a specific domain expert for help and even pay for his or her services(with my own money at that), though none of the ones I've ever contacted ever wanted anything for the time they spent on a problem I presented them with. To put it as question: do I really need to explain how open source works in theory and practice on a place like hackernews? Especially given that most of the world's infrastructure in practice runs thanks to the collaboration of millions who have built most of what we use in their "spare time" as you call it.

Same applies to DNN's(if not more so) - take any large DNN with the papers, data set and code to the author and ask them to give an explanation as to why it works as well as it does while it performs terribly on a different data set, even a similar one. "Well yeah, it's curve fitting which works here but doesn't work there". Why? ¯\_(ツ)_/¯

My rant concerns a different problem - if you have such data and you want to share it, just go ahead and do. 1 of every 10000 might do something useful with it but we aren't talking about nuclear experiments where something can blow up, are we? Worst case scenario someone's cpu or gpu might overheat, big deal. Just ditch the entire bureaucracy crap, we have enough of that as it is in our daily lives.

This is human subject data and there are good reasons to set legal limits on how that data is used, even if anonymized. Your "rant" is not well informed.
>Domain knowledge, while useful is not a necessity - I'm fine with contacting a specific domain expert for help and even pay for his or her services(with my own money at that),

Nah, it's the other way around. Specifically in this domain, innovation is definitely driven by domain experts (researchers, medical equipment manufacturers), and they contract out or hire technical expertise as they need it.

>To put it as question: do I really need to explain how open source works in theory and practice on a place like hackernews?

Your definition of open source is much too narrow and does not include what these researchers meant by it.

>Especially given that most of the world's infrastructure in practice runs thanks to the collaboration of millions who have built most of what we use in their "spare time" as you call it.

Certainly not in medicine.

>Same applies to DNN's(if not more so) - take any large DNN with the papers, data set and code to the author and ask them to give an explanation as to why it works as well as it does while it performs terribly on a different data set, even a similar one. "Well yeah, it's curve fitting which works here but doesn't work there". Why? ¯\_(ツ)_/¯

You've identified the precise reason why all this "machine learning" stuff has been such a dud when it comes to medicine. That is not good enough and never will be, particularly the "works so well on the dataset you overfitted on while performing so terribly on other datasets".

>My rant concerns a different problem - if you have such data and you want to share it, just go ahead and do. 1 of every 10000 might do something useful with it but we aren't talking about nuclear experiments where something can blow up, are we? Worst case scenario someone's cpu or gpu might overheat, big deal. Just ditch the entire bureaucracy crap, we have enough of that as it is in our daily lives.

They spell out the reasons for requiring a data sharing agreement in the data sharing agreement itself. They want you to cite it, they don't want you to sell it, they want you to confirm that you understand you're getting a dataset with no warranties, FDA approvals, etc.

And as I said before, the chances of you doing something useful with it may be 1/10000, but the chances of you doing something useful with it while not being motivated enough to sign a simple agreement and wait for the download link is essentially 0.