Hacker News new | ask | show | jobs
by whistlerbrk 3987 days ago
> Good data science is not based on collecting large amounts of data passively and then mining it mindlessly. You need to ask right questions and design data collection and modeling process based on those questions.

This resonates. That is, picking and designing features. Also understand dependent variables and knowing how to test for that, which is the biggest mistakes leading to flawed conclusions I see from the 'general public'.

1 comments

What do you mean by testing for dependent variables?
Maybe something to do with instrumental variables? https://en.wikipedia.org/wiki/Instrumental_variable