| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by inputcoffee 2797 days ago

Alternative take: there isn't that much low hanging fruit there.

Hear me out.

"To the person who only has a hammer, everything looks like a nail."

The data in front of your is the data you want to analyze, but it doesn't follow that that is the data you ought to analyze. I predict that most of the data you look at will result in nothing. The null hypothesis will not be rejected in the vast majority of cases.

I think we -- machine learning learners -- have a fantasy that the signal is lurking and if we just employ that one very clever technique it will emerge. Sure random forests failed, and neural nets failed and the SVR failed but if I reduce the step size, plug the output of the SVR into the net and change the kernel...

Let me put an example: suppose you want to analyze the movement of the stock market using the movement of the stars. Adding more information on the stars, and more techniques may feel like you're making progress but it isn't.

Conversely, even a simple piece of simple information that requires minimal analysis (this companies sales are way up and no one else but you know it) would be very useful in making that prediction.

The first data set is rich, but simply doesn't have the required signal. The second is simple, but has the required signal. The data that is widely available is unlikely to have unextracted signal left in it.

4 comments

heurist 2796 days ago

I've been selling good data in a particular industry for three years. In this industry at least, the so-called "low-hanging fruit" only seems low-hanging until you realize that the people who could benefit most from the data are the ones who are mentally lazy and least likely to adopt it. Data has the same problems as any other product and may even be harder because you need to 1) acquire the data and 2) build tools that solve reliably difficult problems using huge amounts of noisy information...

link

rademacher 2797 days ago

Isn't there utility in accepting the null hypothesis? It's almost as valuable to know that there is no signal in the data as there is in the opposite, i.e., knowing where not to look for information.

I think your example is really justifying a "machine learner" that has some domain expertise and doesn't blindly apply algorithms to some array of numbers.

link

whatshisface 2797 days ago

I think his argument is that some null hypotheses can be rejected out of hand, but that people are wasting time and effort obtaining evidence that, if they had better priors, would be multiplied by 0.0000000000001 to end up with an insignificant posterior. That's what the astrology example indicates.

link

cpb 2797 days ago

The effort to evaluate the null hypothesis can be costly. In the competitive environment found in most hedge funds, how would you allocate to accepting the null hypothesis?

As in, if you worked at a data acquisition desk, and spent a quarter churning through terabytes of null hypothesis data, what's your attribution to the fund's performance?

link

losteric 2796 days ago

I think they're describing the "look-elsewhere effect": https://en.wikipedia.org/wiki/Look-elsewhere_effect (aka https://en.wikipedia.org/wiki/Multiple_comparisons_problem)

link

inputcoffee 2796 days ago

Accepting the null hypothesis has utility only if you have some reason to believe it would not be accepted.

Accepting it per se has no particular value. You could generate several random datasets, and accept/reject the null hypothesis between them ad infinitum.

To put it another way, its only interesting if its surprising.

link

rafiki6 2796 days ago

Bingo. You nailed it. I work in finance. Developed markets have efficient stock markets. They are highly liquid. The reality is that there's lot of people competing for the same profits. In reality when there's that many players, if there's a profit to be had from a dataset you will be buy from a vendor, chances are one of your many competitors already bought it and found it. This is why we now say don't try to beat the market, you likely can't and mostly just need to get lucky having the right holding when an unforeseen event occurs. Too many variables at play that we just don't understand. Most firms are buying these datasets to stay relevant but they really make no difference in their actual investing strategies.

link

rdlecler1 2797 days ago

This is where you might use a genetic algorithm or to learn which data to use on a particular prediction. Good AI won’t use all data just trim down to signal.

link

pplonski86 2797 days ago

I would like to see use case when AI selects data source to use that humans will never consider.

link

rdlecler1 2784 days ago

It's about weight relative importance, especially in conjunction with multivariate information that may be correlated.

link

jfoutz 2796 days ago

I read a neat criticism of ai techniques. The author pointed out humans can pick out a strong signal as well or better than ai. Humans could pick out signal from an array of weak sources. Ai would identify that case with fewer weak signals required, but it was hard to trust because it was sometimes wrong.

I wish I could remember the source. I’m sure it was an article here a few years ago. I want to say it was medical diagnosis based on charts.

Anyway, the point was there is a very narrow valley where ai is useful beyond an expert. And that valley is expensive to explore. And, there might not be anything there.

link