| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pak 3365 days ago

We need to see the full, published study and its methods (particularly around recruitment and exclusion criteria) before we can judge it properly. Until then, the presented statistics about accuracy, sensitivity, and specificity potentially bear no relation to real world usage, if the cohort and data quality were tightly controlled, as you'd expect for an initial study involving the makers of the algorithm. A few other thoughts:

1. Even at 98% sensitivity and 90% specificity [0], which I don't think would hold up with real world usage in casual, healthy users, if AFib has a prevalence of roughly 2-3% [1] then by a quick back of the envelope calculation a positive test result is still 5× more likely to be a false positive than a true positive. With those odds, I don't think many cardiologists are going to answer the phone. You'd still need an EKG to diagnose AFib.

2. There is huge variance among people's real world use of wearable sensors, and also among the quality of the sensors. (Imagine people that wear the watch looser, sweat more, have different skin, move it around a lot, etc.) You'd likely need to do an open, third-party validation study of the accuracy of the sensors in the Apple Watch before you can expect doctors to use the data. My understanding is that the Apple Watch sensors are actually pretty good compared to other wearable sensors, but I don't know of any rigorous study of that compares them to an EKG.

3. Obviously, this is only for AFib. AFib is a sweet corner case in terms of extrapolating from heart rate to arrhythmia, because it's a rapid & irregular rhythm that probably contains some subpatterns in beats that are hard for humans to appreciate. As others—including Cardiogram themselves [2]—have pointed out previously, many serious arrhythmias are not possible to detect with only an optical heart rate sensor.

[0]: https://blog.cardiogr.am/applying-artificial-intelligence-in...

[1]: https://www.ncbi.nlm.nih.gov/pubmed/24966695

[2]: https://blog.cardiogr.am/what-do-normal-and-abnormal-heart-r...

4 comments

brandonb 3365 days ago

Full journal publication is coming--as you likely know, the system doesn't always move as fast as we'd like.

> quick back of the envelope calculation a positive test result is still 5× more likely to be a false positive than a true positive.

For what it's worth, about 10% of people who come in to the cardiology clinic experiencing symptoms are diagnosed with an abnormal heart rhythm. So even a 20% positive predictive value would be an improvement over the status quo.

As mentioned below, you can use other risk factors (like CHA2DS2-Vasc, or even simply age) to raise the pre-test probability, and thereby control the false positive rate.

As a meta-point, I do think we let the perfect be the enemy of the good in medicine, and that potentially scares people away who could otherwise make positive contributions. For example, many of the most common screening methods in use today are simple, linear models with c-statistics below 0.8. You can build a far-from-perfect system, and still improve dramatically over how people receive healthcare today.

My overall message to machine learning practitioners sitting on the sidelines would be: please join our field. The status quo in medicine is much more primitive than we have been led to believe, and your skills can very literally save lives.

link

pak 3365 days ago

Thanks for replying! I'll certainly be looking forward to the publication.

>about 10% of people who come in to the cardiology clinic experiencing symptoms are diagnosed with an abnormal heart rhythm

OK, but I'd be more careful about staying apples to apples in your comparisons; your app is about asymptomatic AFib. So how many of those people going to the cardiology clinic had undiagnosed AFib; for how many of those would a new diagnosis of AFib have changed the plan of care; etc. Kind of like robbiep was saying, I would be interested in actual added value from the larger perspective.

Totally appreciate your point about perfect being the enemy of the good. The danger is that these semi-medical wearables currently straddle a strange zone between medical and consumer use. The inevitable marketing strategy is to co-opt the positive reputation of medical products while acknowledging none of the pitfalls of consumer products. Most of the screening methods you bring up are used by a doctor on symptomatic patients with a suggestive history, and only as a partial component of clinical judgement. The way Cardiogram seems to make the most money, on the other hand, is to sell the product to asymptomatic, casual users. (Furthermore, CHA2DS2-Vasc costs 30 seconds of talking or reading a medical record, not $700 in Apple products.) So you're inevitably running up against some doubts among physicians [0].

And finally, I agree that more machine learning practitioners should join medical research. I hope the field works to set more reasonable expectations, however, as in: ML will solve very specific subtasks in clinical reasoning (as in the diabetic retinopathy study [1]). Instead, the headlines usually ratchet that up to "AI will replace radiology/cardiology/$specialty in X years." That tends to hurt the people currently in the trenches, since their contribution in bringing about practical, incremental change is diminished. The top answer of this Quora thread [2] has a good discussion of the many dimensions of the problem.

[0]: https://twitter.com/Abraham_Jacob/status/860119573915287552

[1]: http://jamanetwork.com/journals/jama/fullarticle/2588763

[2]: https://www.quora.com/Why-is-machine-learning-not-more-widel...

link

epmaybe 3365 days ago

The diabetic retinopathy study (and the somewhat recent stanford dermatology study) were the first ML studies I had read about that blew me away in terms of their sensitivities and specificities, as compared to real doctors. Your comment on specific subtasks is perfect, and I try and use these examples when discussing ML with fellow medical students.

However, like you said, the medical field is very slow, and has quite a lot of inertia to maintain the status quo. Unless insurance companies refuse to compensate practitioners that don't use these tools, I fear that few, if any, in the healthcare field will opt to use such techniques.

And finally: How should someone with both a medical and computer science background get into ML?

link

pak 3360 days ago

I found the Statistical Learning self paced course on Stanford's site to be a great formal intro to ML algorithms implemented in R, and it is taught by the inimitable Hastie and Tibshirani: http://statlearning.class.stanford.edu

This post on ML in medicine is a pretty good overview of everything that has been going on recently and the nuances often lost in the current hype: https://lukeoakdenrayner.wordpress.com/2016/11/27/do-compute...

link

nraynaud 3365 days ago

About 1) it would still be far better than many, many, medical tests.

link

seizethecheese 3365 days ago

> by a quick back of the envelope calculation a positive test result is still 5× more likely to be a false positive than a true positive. With those odds, I don't think many cardiologists are going to answer the phone. You'd still need an EKG to diagnose AFib.

This is a good point, and certainly nobody should go directly to a cardiologist based on these results. It seems that this would be a good system to recommend that people get an EKG done, though.

link

JshWright 3365 days ago

> probably contains some subpatterns in beats that are hard for humans to appreciate

Not really, no... As you said, AFib is one of a very small number of causes of irregularly irregular heart rates (and is by far the most common). AFib is pretty easy to spot, even just by feeling someone's pulse with your fingers.

link