| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by breadchris 806 days ago

ClickHouse is awesome, but as the post shows, some code is involved in getting the data there.

I have been working on Scratchdata [1], which makes it easy to try out a column database to optimize aggregation queries (avg, sum, max). We have helped people [2] take their Postgres with 1 billion rows of information (1.5 TB) and significantly reduce their real-time data analysis query time. Because their data was stored more efficiently, they saved on their storage bill.

You can send data as a curl request and it will get batch-processed and flattened into ClickHouse:

curl -X POST "http://app.scratchdata.com/api/data/insert/your_table?api_ke..." --data '{"user": "alice", "event": "click"}'

The founder, Jay, is super nice and just wants to help people save time and money. If you give us a ring, he or I will personally help you [3].

[1] https://www.scratchdb.com/ [2] https://www.scratchdb.com/blog/embeddables/ [3] https://q29ksuefpvm.typeform.com/to/baKR3j0p?typeform-source...

1 comments

wiredfool 806 days ago

My first big win for clickhouse was replacing a 1.2tb, billion + row postgresql DB with clickhouse. It was static data with occasional full replacement loads. We got the DB down to ~ 60GB, with query speeds about 45x faster.

Now, the postgres schema wasn't ideal, and we could have saved ~ 3x on it with corresponding speed increases for queries with a refactor similar to the clickhouse schema, but that wasn't really enough to move the needle to near real-time queries.

Ultimately, the entire clickhouse DB was smaller than the original postgres primary key index. The index was too big to fit in memory on an affordable machine, so it's pretty obvious where the performance is coming from.

link

hodgesrm 805 days ago

This is a nice illustration of the effects of different choices for storage layout and use of compute. ClickHouse blows away single-threaded queries on row-based data for analytic questions. On the other hand PostgreSQL can offer far higher throughput and concurrency when updating a shopping cart.

link