|
|
|
|
|
by rch
4521 days ago
|
|
You know, I absolutely see where the poster is coming from, and the suggestions look helpful so far, but the question might as well read: What journals and blogs should I be reading to become a Cardiothoracic Surgeon? (though hopefully nobody bleeds out on a table when someone misconstrues statistical data) We've lived through an amazing time where one could learn by doing, and talented people have been able to compete without the benefit of formal education (myself included), but in my opinion those days are numbered. I've personally observed respected PhD statisticians stumble on the type of problems a data scientist is expected to address. The combination of complex software and often counterintuitive mathematics makes this an imposing field for all but perhaps the top one percent of practitioners. Most everybody else needs to really hit the books for a few years, in a formal setting. With that pre-coffee rant out of the way, I'm looking forward to finding some new sources here myself. So, in that spirit, thanks for the question. |
|
"Data kiddies like me are coming. I just ran multiple passes of the Broyden–Fletcher–Goldfarb–Shanno algorithm with a 100-layer neural network on a tfidf-vectorized dataset. I have no clue what that all exactly means, all I know is that it took under an hour and it gives a higher (top 10%) AUC score. Kaggler amateurs are beating the academics by brute force or smarter use of the many tools that are currently freely available. Show a regular Python dev some examples and library docs and she can compete in ML competitions. I was getting good results with LibSVM before I even understood how SVM's work on the surface. Feed the correct input format and some parameters and you are good to go. Random Forests can be applied to nearly anything and get you 75%+ accuracy. Maybe I am just a engineer looking for pragmatic and practical use of techniques from ML and data science. Hard data scientists will be the statisticians, the algorithmic theory experts, the experimental physicists. It takes me 7 years to understand a complex mathematical paper. It takes me 7 minutes to train a model and predict a 1 million test set with Vowpal Wabbit."
The point is that a Data Scientist is really a person who is a blend of statistician and software engineer. Sure, there are brilliant people who will invent new ML algorithms, but you don't need to invent that stuff to be of tremendous value to a business who has data that they aren't currently getting much value out of. Just as a software engineer at a small business doesn't need to write a database, she just needs to be able to implement one somebody else wrote to add tremendous value.