Hacker News new | ask | show | jobs
by randcraw 3655 days ago
I don't think your points are invalid, but I think you overvalue the data that's available and relevant to most programming tasks. And without novel data, ML can offer little novel value.

Google, Facebook, M$ Research, and perhaps Yahoo are extreme outliers. They have zottabytes of broad unstructured text data, so they mine it. Everybody else has megabytes of narrow structured data, most of it commercial transations of their products. That stuff has already been effectively mined by traditional basic OLAP methods. Most/all of the value has been extracted.

Mainstream software apps have yet to show the value of using ML. Such apps have access to very limited data of very narrow relevance. The utility of ML in such domains isn't new; it's classic optimization. Or it's bayesian anticipation. But it's not a game changer. Frankly, the use of ML in most mainstream apps is more likely to add distraction and annoyance as the computer mispredicts your intent -- like Microsoft Bob did.

Maybe "life in the cloud" will create new opportunities for smarter software. But I definitely don't want free apps making their own decisions when to notify me. I guarantee that will get old immediately. So how will this work? Frankly, I can't guess. Like Apple's iAds, programming ML into the mainstream or cloud sounds like an idea that will serve the software / cloud vendor far better than the user.

2 comments

I don't know why randcraw is being downvoted here: his/her points are vital clarifications.

Humans have been gathering and analyzing data for thousands of years. We have _not_ waited for Google's latest ML or neural nets to do analyses. Otherwise I'd be carving this post onto a stone for future generations to peruse.

The valuable and understandable AI, the step that will make a difference, isn't in "big data" - it's in figuring out how to do what those humans have been doing all those thousands of years.

> I don't know why randcraw is being downvoted here

I can't speak for anyone else, but "M$"

Think outside consumer-facing applications. Medicine, biology, geology (oil, gas, and mining), finance, transportation. Tons of data, tons of dollars, and important problems.
I work in a big pharma analyzing image and experimental data. In a prior life I analyzed social cliques from vast numbers of user transactions. In both cases it seems like greater volumes of data should lead to deeper insights. But as it happens, the amount of useful actionable information in that data was surprisingly limited.

Often the available sensors/assays failed to detect reliable info. Or the phenomenon of interest interdepended on too many variables expressed with too great a dynamic range for us to detect reliably or model usefully. (The present lull in genomics R&D illustrates this well, as do automated interpretation of signals like EEG and NMR spectra.) And the signals that we can extract are often uninterpretable or sporadic. Alas, gathering more data won't yield more signal. Given the present limit on sensor resolution, you just get more mixed signals.

The potential of all ML is limited by the depth of the data that are essential for the discrimination of subtler signals. In the domains you mention (medicine, biology, geology, other sciences) I'm convinced we need better sensors more than greater amounts of the same data available now. We need better hypotheses which lead to better ideas of where to look and what to look for. In general, ML can't help with that. Until we better imagine how the mechanism might work, our questions remain too vague.

To wit, I'm afraid that applying ML to most software apps will suffer from the same limited ROI. I suspect that most app and user data is too shallow for mining to add appreciable value, no matter how clever it is.

Most of that data isn't "big data". And most of it has been analyzed thoroughly. Sure, ML will be used to re-analyze it, but with mostly the same results. As randcraw states "Most/all of the value has been extracted."

Only a wild-eyed ML "gold digger" could imagine that there is a vein of gold in those mines. The reality is that, with few exceptions, we'll find more lumps of coal.

Perhaps I should switch from an ML swamp metaphor to an ML mine metaphor? <--Hah! Do that with ML!