|
|
|
|
|
by breadchris
806 days ago
|
|
ClickHouse is awesome, but as the post shows, some code is involved in getting the data there. I have been working on Scratchdata [1], which makes it easy to try out a column database to optimize aggregation queries (avg, sum, max). We have helped people [2] take their Postgres with 1 billion rows of information (1.5 TB) and significantly reduce their real-time data analysis query time. Because their data was stored more efficiently, they saved on their storage bill. You can send data as a curl request and it will get batch-processed and flattened into ClickHouse: curl -X POST "http://app.scratchdata.com/api/data/insert/your_table?api_ke..." --data '{"user": "alice", "event": "click"}' The founder, Jay, is super nice and just wants to help people save time and money. If you give us a ring, he or I will personally help you [3]. [1] https://www.scratchdb.com/
[2] https://www.scratchdb.com/blog/embeddables/
[3] https://q29ksuefpvm.typeform.com/to/baKR3j0p?typeform-source... |
|
Now, the postgres schema wasn't ideal, and we could have saved ~ 3x on it with corresponding speed increases for queries with a refactor similar to the clickhouse schema, but that wasn't really enough to move the needle to near real-time queries.
Ultimately, the entire clickhouse DB was smaller than the original postgres primary key index. The index was too big to fit in memory on an affordable machine, so it's pretty obvious where the performance is coming from.