ML is easy to get set up, but often difficult to debug if you don't really understand the details. Within the last week I pair reviewed a recommender system written in MLlib (ala this post http://spark.apache.org/docs/latest/mllib-collaborative-filt...), that was doing strange things, despite performing well on a test set. It turned out the metric being used on that page was not a good one for our purposes, and the algorithm had zoomed in on a degenerate solution that nailed the test score. This was clear to me after about 2 minutes by looking at the auxiliary matrices generated. The less experienced person I was helping did not how to proceed.