| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by neel8986 4775 days ago

1) Size of data : Most econometrician work on small data set (mostly in MBs ) which they can they keep in RAM and use R and excel to analyze the data. but modern day data scientist have to deal with GBs (sometimes TBs or even PBs) of data..for such a large data you need multiple machine or even hundreds of machine..So you need to be good at distributed computing and frameworks like hadoop, hive etc

2) Visualization : such large dataset can not always be expressed in bar charts or pie charts...so standard charting tools like excel and R dont work..you need to have good knowledge of charting libraries like d3 or openGl (for 3d visualization) to analyze and express their findings

4) Type of data: Econometricians are never comfortable with unstructured data set consisting of twitter feeds and apache logs..good knowledge of machine learning and graph algorithms are becoming very essential...Apache mahout a machine learning framework build over hadoop is looking extremely promising

1 comments

mc-lovin 4775 days ago

I would also add that econometricians are highly focused, almost exclusively focused in fact, on finding causal relationships.

This means that descriptive work such as clustering, dimension reduction, is often either ignored, or considered as a kind of pre-processing before the real work starts.

link

Fomite 4775 days ago

I think this is a big one, and one of the reasons I would be uncomfortable calling myself a "data scientist" despite meeting some of the more tool-oriented definitions - my work has a much larger focus on attempting to infer causality.

link