I believe one of the key advantages of Spark over Hadoop is being able to run the full stack on a small environment (single machine) and do all the coding there without the need of a cluster just for development.
This is why I like cascading [1]. It has a higher level API on Hadoop. It also works in local mode with little change. I've actually used out to do transformation work from local files (csv), join them into structured documents and dump them into ArangoDB. I liked it so much I wrote a third party library to work ArangoDB in Hadoop[2].
This is a big sell for me. There are some small catches to it - for example, for operations requiring an associative function (such as reduceByKey), the need for it to be associative may not arise until the data-set becomes suitably large to be split across multiple workers, so in my team our testers specifically check reduce function associativity has been demonstrated in a unit test.
1 cascading.org/ 2 https://github.com/deusdat/guacaphant