| And you think https://tech.marksblogg.com/billion-nyc-taxi-rides-clickhous... for example is something that can be considered fast? It takes the user 55 minutes just to load its data into a state so that it can be "queryable". After importing then they spend 34 more minutes making the data into a columnar representation. Alright so 89 minutes in and we still haven't run queries. Oh but its not distribute yet. Darn I have to run some non standard sql commands like CREATE TABLE trips_mergetree_x3
AS trips_mergetree_third
ENGINE = Distributed(perftest_3shards,
default,
trips_mergetree_third,
rand()); Ok can I query my data yet? No you have to move it into this distributed representation and that takes 15 more minutes. Oh ok... And now? Yes you can run your queries but they aren't really very fast. SELECT cab_type, count(*)
FROM trips_mergetree_x3
GROUP BY cab_type; Can take 2.5 seconds on a 108 cpu core cluster for only 1.1BN rows? Thats not fast. That's particularly slow given that requires you to ingest and optimize your data. Maybe you want to show us an example of some simple tests you have run with blazing and clickhouse. As I read it now its not worth our time to look into becuase its so very different from what we are trying to offer which is: Connect to your files wherever you have them
ETL quickly
Train / Classify
Move on! |
I was hoping to see some serious consideration given to these kinds of benchmarks, considering Clickhouse is one of the most cost effective tools I've used in the real world and occasionally outperforms things like mapd.
I was expecting your solution to outperform Clickhouse at least in some aspects, and a benchmark showing where it wins. Instead you reveal ignorance of Clickhouse and even the benchmarks you linked.
Your comment comes off as incredibly arrogant and at the same time incredibly misinformed. Disappointing to see this attitude from the team.