| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sciboy 5566 days ago

I used to love data mining. Then I formally learned statistics, causal inference, mcmc and other similar methods. I was amazed at how poor the computer science was for anything outside of computer science problems (i.e. search, collaborative filtering etc). Looking back I now realise that my amazement was misplaced - tools are good for what they are designed for. How many computer scientists who write the tools run experiments or do real exploratory data analysis with data they have collected?

Agree completely with their model selection criteria; model selection is useful when used to compare evidence for k physically plausible models, but should be treated with extreme caution where exploring an infinite model space.

Personally I hold very little hope for automated tools anymore. Considering how complex a seemingly simple study is to analyse correctly, or how hard it is to model physically realistic processes, I think the future does not bode well for tools like eureka as they are currently aimed. At best eureka may present some general hypothesis, but it seems unlikely to be able to search the model space for models that are physically plausible, except in the most simple of all cases.

Besides, try eureka on some of your real data and prepare to be deflated :)