| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by autokad 2912 days ago
	usually I am joining many different data sets many of which include some time of log data (sometimes petabytes in size but usually a few TB). the logs are persisted to hdfs or s3, which is why spark and hive make such a nice way of doing work compared to something like postgress. also, its nice to plop json, avro, csvs, parquet, or what ever data in storage and just query/join/analyze it. no need to put the story on hold because you are waiting for the oracle dba to increase space again.