|
We need to see the full, published study and its methods (particularly around recruitment and exclusion criteria) before we can judge it properly. Until then, the presented statistics about accuracy, sensitivity, and specificity potentially bear no relation to real world usage, if the cohort and data quality were tightly controlled, as you'd expect for an initial study involving the makers of the algorithm. A few other thoughts: 1. Even at 98% sensitivity and 90% specificity [0], which I don't think would hold up with real world usage in casual, healthy users, if AFib has a prevalence of roughly 2-3% [1] then by a quick back of the envelope calculation a positive test result is still 5× more likely to be a false positive than a true positive. With those odds, I don't think many cardiologists are going to answer the phone. You'd still need an EKG to diagnose AFib. 2. There is huge variance among people's real world use of wearable sensors, and also among the quality of the sensors. (Imagine people that wear the watch looser, sweat more, have different skin, move it around a lot, etc.) You'd likely need to do an open, third-party validation study of the accuracy of the sensors in the Apple Watch before you can expect doctors to use the data. My understanding is that the Apple Watch sensors are actually pretty good compared to other wearable sensors, but I don't know of any rigorous study of that compares them to an EKG. 3. Obviously, this is only for AFib. AFib is a sweet corner case in terms of extrapolating from heart rate to arrhythmia, because it's a rapid & irregular rhythm that probably contains some subpatterns in beats that are hard for humans to appreciate. As others—including Cardiogram themselves [2]—have pointed out previously, many serious arrhythmias are not possible to detect with only an optical heart rate sensor. [0]: https://blog.cardiogr.am/applying-artificial-intelligence-in... [1]: https://www.ncbi.nlm.nih.gov/pubmed/24966695 [2]: https://blog.cardiogr.am/what-do-normal-and-abnormal-heart-r... |
> quick back of the envelope calculation a positive test result is still 5× more likely to be a false positive than a true positive.
For what it's worth, about 10% of people who come in to the cardiology clinic experiencing symptoms are diagnosed with an abnormal heart rhythm. So even a 20% positive predictive value would be an improvement over the status quo.
As mentioned below, you can use other risk factors (like CHA2DS2-Vasc, or even simply age) to raise the pre-test probability, and thereby control the false positive rate.
As a meta-point, I do think we let the perfect be the enemy of the good in medicine, and that potentially scares people away who could otherwise make positive contributions. For example, many of the most common screening methods in use today are simple, linear models with c-statistics below 0.8. You can build a far-from-perfect system, and still improve dramatically over how people receive healthcare today.
My overall message to machine learning practitioners sitting on the sidelines would be: please join our field. The status quo in medicine is much more primitive than we have been led to believe, and your skills can very literally save lives.