Hacker News new | ask | show | jobs
by gknight 4554 days ago
Have you tried Apache Hive? I believe it was meant to make Hadoop easier to use by way of SQL-like commands. Something like Qubole might be able to help too.
3 comments

I have, I should've mentioned that as an option. But I find it much easier to think in Scala than in an SQL-like language.
Pig is another option. It allows using SQL like commands on the grunt shell, making using Hadoop a lot easier.
Going from Hive/Pig to Spark enables substantial improvement in developers' productivity (for non-reporting/BI workloads). You can properly unit test your program, use a debugger, and have all your code in the same place in the same language (rather than in the case of Pig, write UDFs in Java and then use a pseudo-scripting language for workflow specification).

All of these are just productivity gains; not to mention the performance gains you get when you go from MapReduce to Spark.

There is Shark which is Hive implemented on top of Spark instead of Hadoop.