|
|
|
|
|
by neel8986
4775 days ago
|
|
1) Size of data : Most econometrician work on small data set (mostly in MBs ) which they can they keep in RAM and use R and excel to analyze the data. but modern day data scientist have to deal with GBs (sometimes TBs or even PBs) of data..for such a large data you need multiple machine or even hundreds of machine..So you need to be good at distributed computing and frameworks like hadoop, hive etc 2) Visualization : such large dataset can not always be expressed in bar charts or pie charts...so standard charting tools like excel and R dont work..you need to have good knowledge of charting libraries like d3 or openGl (for 3d visualization) to analyze and express their findings 4) Type of data: Econometricians are never comfortable with unstructured data set consisting of twitter feeds and apache logs..good knowledge of machine learning and graph algorithms are becoming very essential...Apache mahout a machine learning framework build over hadoop is looking extremely promising |
|
This means that descriptive work such as clustering, dimension reduction, is often either ignored, or considered as a kind of pre-processing before the real work starts.