Hacker News new | ask | show | jobs
by threeseed 3740 days ago
The data format is important. ORC/Parquet being substantially faster then Text or Sequence files.

But the query engines are far more important in terms of performance. Just spend any time with SparkSQL and then Hive and you'll know what I mean.