| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by randcraw 3654 days ago

I work in a big pharma analyzing image and experimental data. In a prior life I analyzed social cliques from vast numbers of user transactions. In both cases it seems like greater volumes of data should lead to deeper insights. But as it happens, the amount of useful actionable information in that data was surprisingly limited.

Often the available sensors/assays failed to detect reliable info. Or the phenomenon of interest interdepended on too many variables expressed with too great a dynamic range for us to detect reliably or model usefully. (The present lull in genomics R&D illustrates this well, as do automated interpretation of signals like EEG and NMR spectra.) And the signals that we can extract are often uninterpretable or sporadic. Alas, gathering more data won't yield more signal. Given the present limit on sensor resolution, you just get more mixed signals.

The potential of all ML is limited by the depth of the data that are essential for the discrimination of subtler signals. In the domains you mention (medicine, biology, geology, other sciences) I'm convinced we need better sensors more than greater amounts of the same data available now. We need better hypotheses which lead to better ideas of where to look and what to look for. In general, ML can't help with that. Until we better imagine how the mechanism might work, our questions remain too vague.

To wit, I'm afraid that applying ML to most software apps will suffer from the same limited ROI. I suspect that most app and user data is too shallow for mining to add appreciable value, no matter how clever it is.