Hacker News new | ask | show | jobs
by StreamBright 1985 days ago
Depends. Just some random mixture of stacks: PrestoDB, S3, Airflow, Luigi, Dremio, Athena, Hive LLAP, EMC Isilon, Kafka.

My favorite so far is S3 + PrestoDB with either ORC or Parquet files. It is a solid DWH solution for most enterprises on the cloud. (Cloud or not is a different discussion). It works for small scale (50TB) to really high scale (50PB). There are some (very few) gotchas and moving parts as opposed to Hadoop + co. You can combine it with Kafka for streaming data and you got yourself a pretty solid data solution.