Hacker News new | ask | show | jobs
by iknownothow 1172 days ago
1. How many insertions do you expect per second or per minute?

2. What's the size of each insert?

3. At the end of one year, what's the total size of your dataset?

4. How long can your largest and most complex analytical query take to finish? Should it finish in a minute? Is it okay if it takes an hour? Is it okay if it takes upto 24 hours?

1 comments

1. 80,000 - 125,000 ( on peak per minute ) - expecting 5-6 times in increase in coming few months.

2. Size of each insert ( approx 1 KB )

3. Year end datasize = Not available ( too early to guess, but average 600-700 GB )

4. Query must finish in around a minute or around.

The total dataset size doesn't seem much at ~1TB. So you can use Clickhouse for fast analytics like others have suggested. Consider hosting on Hetzner since they have cheap Nvme disks.

The biggest problem you're going to face is ingestion of these events during peaks at 500k events per minute. You can't ingest them individually into Clickhouse or most other databases. So unfortunately you will have to add one additional streaming layer to cache these events so you can create batches of events once every few seconds and ingest a big batch of 1k-10k events into Clickhouse. AWS API Gateway + Kinesis is operationally easy to set up and quite cheap and should be able to handle your peak load. Afterwards use a Lamda to batch >1000 events from Kinesis and insert into Clickhouse. I've never tested this last part so I'm not sure how it will work out.

It'd be nice to know what you eventually go with. Please send me a message if you can of what you've finally chosen.

ingestion of these events during peaks at 500k events per minute. You can't ingest them individually into Clickhouse or most other databases.

Turn on async_insert or use a Buffer table engine and you can easily insert them individually into ClickHouse

That's interesting! I don't have much experience with Clickhouse, especially not in the last two years. I'll have to try this out myself. That's a pretty incredible if it can handle batching internally.