|
Apache Drill is an interesting project, from all the MPP engines that appeared a few years ago, it was the most similar one to BigQuery (the first public version) and the most flexible. However, the competion was fierce and each Big Data vendor (MapR, Cloudera and HortonWorks) was pushing its own solution: Drill, Impala and Hive on Tez. Competion is always a good thing, but it fragmented the user base too much so no clear winner emerged. At the same time, Spark SQL got sufficiently better to replace these tools in most use cases and Presto (from Facebook) got the traction and the user base that none of these projects had by being vendor agnostic (and its adoption by AWS in Athena and EMR also helped boost its popularity). |
The reality is that nowadays both SparkSQL and Presto are way behind Hive, in terms of both speed and maturity. Hive made tremendous progress since 2015 (with the introduction of LLAP), while SparkSQL still has the issue of stability of fault tolerance and shuffling. (Presto does not support fault tolerance.) So, IMO, SparkSQL is nowhere near ready to replace Hive.
If you are curious about the performance of these systems, see [1] and [2] which compare Hive, SparkSQL, and Presto. Disclaimer: We are developing MR3 mentioned in the articles. However, we tried to make a fair comparison in the performance evalaution.
[1] https://mr3.postech.ac.kr/blog/2019/11/07/sparksql2.3.2-0.10... [2] https://mr3.postech.ac.kr/blog/2019/08/22/comparison-presto3...