Hacker News new | ask | show | jobs
by paulasmuth 3572 days ago
That's almost 70gbit/s (are those cloudflare http logs by any chance?) on 100 nodes vs ~170mbit/s on 6 nodes.

Or, in other terms 700mbit/s per host with your kafka setup versus ~30mbit/s per host in the benchmark. Allthough your machines seem to be quite a bit beefier (I wonder if all that RAM is actually used?).

1 comments

A lot of it is log data from requests passing through CloudFlare. We run them through Kafka and consumers do stuff like attack detection and generate statistics for our customers. We have about 4 million sites on CloudFlare and each customer has access to analytics about their site which are stored in a CitusDB database.
Impressive. Any chance you could tell how much data is stored for the analytics service after pre-aggregation? (In terms of TB/day or so - I guess it can't be the full 70gbit/s?).
No, nothing like that much data. It's aggregates like "requests per second" and "attacks per second" etc. The actual log lines aren't stored.

More details: https://blog.cloudflare.com/scaling-out-postgresql-for-cloud...

Thanks. I had read that article some time back :)

Sadly it does not give a figure/order of magnitude of the amount of data that's stored in citus after aggregation, but I guess it's just not public information. [I'm working on a system that is somewhat similar to CitusDB (eventql.io) and am always really interested in these numbers]

EDIT: I can't reply to your other comment for some reason but many thanks for digging that up, it's very interesting info to me!

I'd be happy to make the numbers public I just don't have them in front of me. It looks like data is going into the CitusDB cluster at about 15Mbps.