We've used this dataset to build a product review classification pipeline as an example application that can be developed using our project, KeystoneML (which runs on spark) - code is here: https://github.com/amplab/keystone/blob/master/src/main/scal...