|
|
|
|
|
by gtrubetskoy
3743 days ago
|
|
Actually, you might want to not choose any database at all, but instead focus on deciding on the data format, such as Parquet (http://parquet.io) or Avro (https://avro.apache.org/), etc. Many of the tools such as Hive, Impala, Spark, etc. support these formats natively. You will also need to think about the schema, partitioning, compression and other parameters, and those are not trivial decisions. |
|
But the query engines are far more important in terms of performance. Just spend any time with SparkSQL and then Hive and you'll know what I mean.