Hacker News new | ask | show | jobs
by gghyslain 3467 days ago
Thanks everyone for the positives feedbacks. I did not have much time yet to write down full reviews of all the books, but I'll work on it - so far this page is more of a personal "bookmark". But to reply to @Nekopa and @carlsednaoui, here is a short review of the first books.

I have had a really pragmatic approach about reading them - only focusing first on parts relevant to my projects.

# An Introduction to Statistical Learning (ISL) / The Elements of Statistical Learning (ESL)

I focused on chapter 8-9 of ISL about Tree Based Methods and SVMs, two algorithms I used for my dissertation project. I found ISL to provide very clear explanations of the algorithms with just enough mathematical formalism.

I have a good math background so ESL was interesting to go through. But I am more of a practical person, and I found ISL to be more suited for me when it came down to working on my project and supporting my choices.

# Python Machine Learning

Really great hands-on book ! Sebastian Raschka manages well to guide you through all steps of a ML project data: pre-processing, feature engineering, model selection... - all the steps are defined and covered with practical examples.

I strongly recommend this book if you are just starting out with ML and feel "lost" about how to start your own project.

# Taming Text

I decided to use text data I had available for my dissertation project. However, half-way through the book I realized my dataset was to small to apply any of the techniques described there. I still like the practical approach and in the end the book gave me a good idea of what can be done with text.

# Advanced Analytics with Spark

I picked this book once I started working on the implementation of my project into production - we use Apache Spark (Scala) at work.

It provided me with a good introduction to Spark BUT it's based on the RDD-api and as stated on Spark website: "As of Spark 2.0, the RDD-based APIs in the spark.mllib package have entered maintenance mode. The primary Machine Learning API for Spark is now the DataFrame-based API in the spark.ml package."

I'm now mostly relying on Spark Doc / API, I'm not aware of any up-to-date books yet :)