Hacker News new | ask | show | jobs
by dmitrig01 2333 days ago
Hive and Impala are databases.

Presto and SparkSQL are SQL interfaces to many different datasources, including Hive and Impala, but also any SQL database such as Postgres/Redis/etc, and many other types of databases, such as Cassandra and Redis; the SQL tools can query all these different types of databases with a unified SQL interface, and even do joins across them.

The difference between Presto and SparkSQL is that Presto is run on a multi-tenant cluster with automatic resource allocation. SparkSQL jobs tend to have to be allocated with a specific resource allocation ahead of time. This makes Presto is (in my experience) a little more user-friendly. On the other hand, SparkSQL has better support for writing data to different datasources, whereas Presto pretty much only supports collecting results from a client or writing data into Hive.

1 comments

I think some of this might be misinformed.

I know Hive can definitely query other datasources like traditional SQL databases, redis, cassandra, hbase, elasticsearch, etc, etc. I thought Impala had some bit of support for this as well, though I'm less familiar with it.

And SparkSQL can be run on a multi-tenant cluster with automatic resource allocation - Mesos, YARN, or Kubernetes.