Hacker News new | ask | show | jobs
by dangerlibrary 4774 days ago
Things I didn't learn in my econometrics classes that are used all over in data science:

1) Machine learning techniques for analysing data sets as opposed to parametric models

2) Clustering (k-means, etc.)

3) TF / IDF

4) Using a variety of data sources / tools - my econometrics educations was heavily Stata dependent. Learn a little bit of SQL, R, and Matlab so that getting up to speed doesn't take you longer than a month.

1 comments

Thanks. What do you mean TF / IDF?
It's a method of determining which words in an arbitrary collection of documents (tweets, for instance) are most important when classifying those documents.

Term-Frequency-Inverse-Document-Frequency. Assigns each word a score based on how often it appears in a document relative to how often it appears in all documents.

https://en.wikipedia.org/wiki/Tf%E2%80%93idf