Will never happen. As a ml researcher, I'd absolutely love to apply ml to Healthcare but the data simply isn't there. Big companies will never release the data. Government intervention is needed,but that won't happen with the new administration...
Startups often have health data and an interest in machine learning. For example, we presented research at a NIPS workshop where we trained an LSTM to predict abnormal heart rhythms from about 793 million heart rate measurements. The heart rate measurements came from our Apple Watch app, Cardiogram, and the "gold standard" data came from a study we're running with UC San Francisco (wsj.com/articles/new-study-seeks-to-use-deep-learning-to-detect-heart-disease-1458240739).
It's not just us. The NIPS machine learning for healthcare workshop had hundreds of attendees this year from both industry and academia: https://www.nipsml4hc.ws
If you're an ML researcher or engineer and want to use machine learning to save lives, feel free to email me. I'm brandon@cardiogr.am. Happy to talk about our company or point you to relevant research.
I'd be more interested, as someone who has occasional atrial fibrillation, if my data results in any particular predictions for me as to severity, likely occurrence while I'm sleeping, etc. I'm 100% on board with my data benefitting all of humanity, but as a sufferer, how can it more immediately provide me with any actionable insights.
We do plan to expose predictions for you within Cardiogram. The biggest barrier is that any predictions relating to a health condition (like atrial fibrillation) are FDA-regulated, so there's a high level of scientific validation that must happen first.
Sounds like you have paroxysmal AF. I can't tell you much about any direct benefits ML could provide there.
I do work on data from persistent AF patients. Specifically, trying to predict AF recurrence after treatment with electrical cardioversion. Basically, electrical cardioversion is an effective treatment for some subset of persistent patients, but for another subset it is not. Doctors have a hard time deciding which patients can benefit from electrical cardioversion and which will not. If we can build a model that predicts this, we can avoid unnecessary procedures (which always carry some risk) and explore other treatment options instead. If this works well, it would directly benefit individuals.
The US is not the world. In some countries (e.g. UK's NHS), it might happen.
But the data isn't actually collected -- e.g., I was told Holter monitor recordings are not kept, only looked at and discarded if nothing is found; and data which is collected is often useless for automated analysis.
As someone who has worked in ML, I'm said that's the case. As someone who also worked in security, I dread the day this data is properly collected. It will not be properly anonymized, it will be available to shady people for the right price, and it's more likely your enemies and insurers will know when your heart is going to fail before you do.
NHS as I understand is state owned and a well-functioning organisation. So NHS if chooses can play a huge role in global health with all the ML and health-data. Again, as you have rightly pointed its about keeping bad guys out.
It seems remarkably well suited in concept for playing such a role. However, in this respect (among others), it's not a well-functioning organisation. It's not even just one organisation.
Silently downvote me all you like, but only in the last month have I seen code I wrote before we even used CVS (remember that?) for source control still being used daily.
18 months ago I submitted a proposal for some software I had ready to go, happy to discuss costs but suggested a small amount (£10/user/month IIRC) with a costed business plan showing the savings it could make. They declined it, fine, but then gave the proposal document to an in house developer who's been working full time on it since and still hasn't even shown a line of code to anyone.
I'd tried to get national level innovation funding for that so the local organisation wouldn't have to pay for it. NHS innovation money is only available for proven software. By which I mean having a full system tested by clinicians and ready to go isn't enough; it has to already be in production use to be innovation!
The last couple of months a marketing agency have been trying to sell some software I've written across the NHS to help people collaborate, but they just can't do it. They've concluded it has to be sold to one small area at a time, literally starting in a GP surgery for something that makes most sense rolled out across the organisation. Before I approached the marketing agency I'd contacted the front door email of about 5 NHS IT organisations claiming to help suppliers improve the NHS about how to start the ball rolling; none of them got back to me.
There's a clue to the fractured nature of the NHS in the article linked elsewhere in the comments about Google getting AI data; they've only been able to get the data for 1.6 million patients. The NHS deal with that many patients in 2-3 days.
A month or two ago their email hit national headlines for going into meltdown after someone spammed most of the staff with the mailing list in CC and they all ended up replying to all to ask to be removed from the list.
I've been interested in application of ML to healthcare for nearly 20 years. In the early days, researchers just refused to share data; they were worried that you (using some fancy math or ML) would upstage them. I actually heard this first-hand from a faculty member.
Fast forward to today, and there is more openness. I did skim the paper mentioned here, but did not see any links to the actual data, which is a shame.
I think this is still true today. I worked at large research institution, and the doctors were cut-throat when it came to keeping other people from using their data.
My understanding is that 70% of the difficulty medical research is collecting the data. So once you've done that you want to make sure you've protected your investment of time and energy.
From an individual stand point I understand why they do it. From a societal standpoint it's such a waste.
Also their trove of data is not a one shot. They might collect data a mountain of data on Leukemia but publish one study on Leukemia's correlation with high power lines. Publishing this paper wouldn't necessitate opening up all of their data from either a horizontal or vertical perspective.
Seems like government intervention is the problem -- specifically, occupational licensing. There is a huge barrier right now for ML-healthcare startups that already can provide better diagnoses than normal doctors in some cases (and at a fraction of the cost), but are at an impasse to license their technology to legally "practice medicine."
If I was a doctor I'd love to have a ML assistant who will also look over my patients' symptoms and find correlations that I missed. Is that any more "practicing medicine" than a reference book is?
So we need a free software effort to produce some kind of machine learning/statistical analysis tools that would not be sold but merely used by whoever wants them. The user should be a medical professional using the tool as an aid much like a reference book. A good doctor doesn't take the book as gospel (at least my GP doesn't) but weighs what it says together with the specifics of the case and so on together with his or her own experience. So perhaps a grassroots start could be possible.
Of course the major stumbling block of access to good data remains.
The UK's NHS recently opened up a large amount of its data to Google [0]. In parallel efforts, a company called Nuna is gathering and unifying data from state level Medicaid programs so it can be analyzed similarly [1].
A previous company I worked for was scared to death of holding any records that may be classified as health records because of the regulatory implications.
>As a ml researcher, I'd absolutely love to apply ml to Healthcare but the data simply isn't there. Big companies will never release the data.
The big-data regime isn't the only regime. Probabilistic models can encode doctors' prior knowledge and also be trained with only a few dozen data points.
Trampling privacy to extend life sounds like exactly the sort of thing Peter Thiel would be for and he seems to be running the show regarding the Trump administration and the FDA.
Is this true in all developed countries? What about healthcare systems owned and operated by the government. Seems plausible a government contract could provide access to such data.
> Will never happen. As a ml researcher, I'd absolutely love to apply ml to Healthcare but the data simply isn't there
A famous(in medical circles) cardiologist Eugene Braunwald once said after a stint practicing in mexico: "...they have the patients we have the technology..."
The US is not the world.
What sorts of data sets are you looking for? It is probably available from other countries or could be more easily collected. I have an interest in this area too