Hacker News new | ask | show | jobs
by tmarthal 4159 days ago
Right. Using pandas may allow you to process more data than an excel spreadsheet, but if the file is on your laptop then it's inherently not very big. It's a great tool, but it is a data exploration tool, not a data processing workflow tool.

IMHO, there are too many 'data scientists' nowadays that are taking averages and calling it 'analytics'. If the call you are making exists in a library, more than likely it's not "sophisticated".

2 comments

> If the call you are making exists in a library, more than likely it's not "sophisticated".

What a disparaging comment. The whole point of good software design is many of the algorithms you may need to use are packaged up for ease of use. The best ones are highly specified by the parameters you supply. "Sophisticated" should play no role in a data scientists workday - results should be verifiable and understandable, and any data analysis pipelines should be extensible and repeatable. Writing their own deep-learning implementation does not a good data scientist make.

While the engineer in my agrees, the business person says it really doesn't matter what techniques are used if he/she is providing value. If all it takes is some averaging to save a company a good chunk of money, then that person may be earning their pay.
Averaging the right things in the right way is important too--such as averages within clusters.