If you are interested in this, you might be interested in my (warning: shameless plug) 5 part blog series located at http://blog.caseystella.com/pyspark-openpayments-analysis.ht.... I'm using the python bindings for Spark to illustrate doing data analysis on healthcare financial data on Hadoop.