Hacker News new | ask | show | jobs
by ajb 335 days ago
Here's what may seem like an unrelated question in response: how can we get 10^7+ bits of information out of the human body every day?

There are a lot of companies right now trying to apply AI to health, but what they are ignoring is that there are orders of magnitude less health data per person than there are cat pictures. (My phone probably contains 10^10 bits of cat pictures and my health record probably 10^3 bits, if that). But it's not wrong to try to apply AI, because we know that all processes leak information, including biological ones; and ML is a generic tool for extracting signal from noise, given sufficient data.

But our health information gathering systems are engineered to deal with individual very specific hypotheses generated by experts, which require high quality measurements of specific individual metrics that some expert, such as yourself, have figured may be relevant. So we get high quality data, in very small quantities -a few bits per measurement.

Suppose you invent a new cheap sensor for extracting large (10^7+ bits/day) quantities of information about human biochemistry, perhaps from excretions, or blood. You run a longitudinal study collecting this information from a cohort and start training a model to predict every health outcome.

What are the properties of the bits collected by such a sensor, that would make such a process likely to work out? The bits need to be "sufficiently heterogeneous" (but not necessarily independent) and their indexes need to be sufficiently stable (in some sense). What is not required if for specific individual data items to be measured with high quality. Because some information about the original that we're interested in (even though we don't know exactly what it is) will leak into the other measurements.

I predict that designs for such sensors, which cheaply perform large numbers of low quality measurements are would result in breakthroughs what in detection and treatment, by allowing ML to be applied to the problem effectively.

5 comments

I think it's a very interesting approach and I highly support such an initiative. The easiest way to get a lot of data out of the body is probably to tap the body's own monitoring system - the sensory nerves.

A chemosensor also sounds like a useful thing it should give concentration by time. Minimally invasive option would be to monitor breath, better signal in blood.

Or perhaps even routine bloodwork could incorporate some form of sequencing and longitudinal data banking. Deep sequencing, which may still be too expensive, generates tons of data that can be useful for things that we don't even know to look for today, capturing this data could let us retroactively identify meaningful biomarkers or early signals when we have better techniques. That way, each time models/methods improve, prior data becomes newly valuable. Perhaps the same could be said of raw data/readings from instruments running standard tests as well (as opposed to just the final results).

I'd be really curious to see how longitudinal results of sequencing + data banking, plus other routine bloodwork, could lead to early detection and better health outcomes.

Last time someone tried to inject chips into the bloodstream, public opinion didn't handle it too well. It's the same as we would learn a lot by being more cruel to research animals. But most people have other priorities. Good or bad ? Who knows ? Research meets social constructs.
I am not proposing injecting chips.
Apart from the likely technical infeasibility of your idea in today's society, this would require a humongous and diversified population sample to be meaningful (your 'heterogeneous bits'). This follows directly from the complexity of metabolic pathways you wish to analyze. Socially, you'll only be able to achieve that by not asking your sample for consent. Otherwise you'll have a highly biased sample, which could still be useful but for severely restricted research questions.
There are some pretty big longitudinal studies with consent ( "45 and up" are a quarter of a million people, for example - that's big enough that working predictions within the cohort would be a worthwhile health outcome).

There are nevertheless privacy issues, which I did not address as my first comment was already very long, especially for a tangent. Most obviously, people would be consenting to the collection of data whose significance they cannot reasonably forsee.

I do agree that most current AI companies are unlikely to be a good steward of such data, and the current rush to give away health records needs to stop. In a way it's a good thing that health records are currently so limited, since the costs will so obviously outweigh the benefits.

Someone should add a sensor to all those diabetes sensors people have in their arms all day and collect general info. It would obviously bias towards diabetics but that's like half the US population anyways so maybe it wouldn't matter that much.
If you can let the detector be so cheap, doctor will love you!