Hacker News new | ask | show | jobs
by ma2rten 2013 days ago
I hadn't used AWS in about 4 years, but I recently starting using it for a side project. I needed to process a big dataset and I wanted to use pyspark, so I gave EMR a try. I was impressed how easy it was seemingly to create clusters in the UI and then run jobs using an ipython notebook.

That is until I realized that nothing worked. You had to use a version that was 5 versions behind the current version which is the opposite of what the documentation said which explicitly said not to use that version. Even then not everything worked out of the box.

1 comments

We had to do a security review of EMR recently. I’m amazed it works at all. Hop on one of the cluster nodes and take a look at the processes running.
It was probably an issues specifically with notebooks.