This got me curious about our Exasol environment, which we've been running since 2016 at Piedmont Healthcare. We average 2 million queries per day (DDL/DML/DQL). Our query failure rate is ~0.1%. Only 7% of those failures were due to hitting resource limits. The rest were SQL issues: constraint errors, data type issues, etc. Average connected users is ~400. Average concurrent queries is ~7 with a daily max average of ~78 concurrent queries. Avg query time across DQL statements is around 10 seconds, which is only that high due to some extreme outliers -- I have users that like to put 200k values in a WHERE clause IN statement, and Tableau sometimes likes to write gnarly SQL with LOD calcs and relationship models.
TPC-H benchmarks are what convinced us to purchase Exasol 10 years ago. Still happy with that decision! Congrats to the Exasol team on these results vs ClickHouse.
As far as bottlenecks, I haven't really hit any in the 10 years we've been using it. Any bottleneck pain points are really user induced. We had some initial system instability during our upgrade from v7 to v8, which was a significant platform architecture change. Those issues have now been resolved. Exasol has pretty good support.
Regarding the 200k values in a where clause, we have some users that will do research across published data source in Tableau. They will copy account IDs from one report and paste them into a filter in another. Our connections from Tableau to Exasol are live. Tableau doesn't have great guardrails on the SQL that gets issued to the database.
The concurrent query comes from a daily statistics table in Exasol. There is an average and max concurrency measure aggregated per day. I averaged the last 30 days. Exasol doesn't really explain their sampling methodology in their documentation: https://docs.exasol.com/db/latest/sql_references/system_tabl...
When I search for "high performance analytical database" in Bing, Ai summarized results are ClickHouse, Apache Druid, Singlestore, Couchbase, and Apache Pinot are considered among the best databases for real-time analytics due to their low query latency and high performance.
In Google, Ai summarized results are ClickHouse, StarRocks, Snowflake, and Google BigQuery.
Clickhouse is there in both of them and Exasol is not mentioned. If these claims were relevant, why is it not in the limelight?
Clickhouse is known to ingest and analyze massive volumes of time-series data in real-time. How good is Exasol for this use case?
You need to look at use-case alignment as well as performance.
Apache Pinot, Druid and Clickhouse are designed for low-latency analytical queries at high concurrency with continuous ingestion. Pinot is popular because of it's native integration with streaming systems like Kafka, varied indexing, and it's ability to scale efficiently. They're widely used in observability and user-facing analytics – which are how “real-time analytics databases” are commonly perceived today.
Exasol (and SingleStore, Snowflake, BigQuery, etc) are more focused on enterprise BI and complex SQL analytics rather than application serving, or ultra-high ingest workloads. It performs well for structured analytical queries and joins, but it’s less commonly deployed with the user-facing analytics or high volume usage.
In case you have a single table with time-series data, then Clickhouse will perform typically better. It's very much optimized for this type of use cases.
Once you are joining tables and having more advanced analytics, than Exasol will easily outperform it.
Exasol has been performance leader for more than 15 years in the market, as you can see in the official TPC-H publications, but has not gotten the broader market attention yet. We are trying to change that now and have recently been more active in the developer communities. We also just launched a completely free Exasol Personal edition that can be used for production use cases.
This is the article I wish existed when we were evaluating platforms. "Reliability under realistic conditions is the first scalability constraint". Speed means nothing if queries don't finish.
Traditional database benchmarks focus on throughput and latency – how many queries per second can be processed, how execution time changes as hardware resources increase. This benchmark revealed something different: reliability under realistic conditions is the first scalability constraint.
From my experience, planning is often the first headache I have to deal with (join order, hash sizing, operator choice), before concurrency and memory even come into play.
How you actually interpret what you're seeing here? does it look like more like optimizer fragility (plans that assume ideal memory conditions) or more like runtime memory management limits (good plans, but no adaptive behavior under pressure)?
I think the issue in the tests was the lack of a proper resource management of Clickhouse that led to queries failing under pressure. Although I have to admit that the level of pressure was minimal. Just a few concurrent users shouldn't be considered pressure. Also, having far more RAM than the whole database size means very little pressure. And the schema model is quite simple, just two fact tables and a few dimension tables.
Any database should be able to handle 100 concurrent queries robustly, even if this means to slow down the execution of queries.
TPC-H benchmarks are what convinced us to purchase Exasol 10 years ago. Still happy with that decision! Congrats to the Exasol team on these results vs ClickHouse.