|
|
|
|
|
by autokad
2912 days ago
|
|
usually I am joining many different data sets many of which include some time of log data (sometimes petabytes in size but usually a few TB). the logs are persisted to hdfs or s3, which is why spark and hive make such a nice way of doing work compared to something like postgress. also, its nice to plop json, avro, csvs, parquet, or what ever data in storage and just query/join/analyze it. no need to put the story on hold because you are waiting for the oracle dba to increase space again. |
|