|
|
|
|
|
by epdlxjmonad
2225 days ago
|
|
There is a common belief that SparkSQL is better than Hive because SparkSQL uses in-memory computing while Hive is disk-based. Another common belief is that Presto is better than Hive because it is based on MPP design and was invented for the very purpose of overcoming the slow speed of Hive by the very company (Facebook) that invented Hive in early 2010s. The reality is that nowadays both SparkSQL and Presto are way behind Hive, in terms of both speed and maturity. Hive made tremendous progress since 2015 (with the introduction of LLAP), while SparkSQL still has the issue of stability of fault tolerance and shuffling. (Presto does not support fault tolerance.) So, IMO, SparkSQL is nowhere near ready to replace Hive. If you are curious about the performance of these systems, see [1] and [2] which compare Hive, SparkSQL, and Presto. Disclaimer: We are developing MR3 mentioned in the articles. However, we tried to make a fair comparison in the performance evalaution. [1] https://mr3.postech.ac.kr/blog/2019/11/07/sparksql2.3.2-0.10...
[2] https://mr3.postech.ac.kr/blog/2019/08/22/comparison-presto3... |
|
On the other hand, our Presto cluster runs pretty much anything we throw at it, and when it fails, the failures are easier to anticipate and mitigate. It's also quite simple to deploy and operate.