Hacker News new | ask | show | jobs
by proddata 1705 days ago
> is an OLAP database a common go-to for longer-timescale analytics (as in [1])?

I would not consider Clickhouse or CrateDB "classic" OLAP DBs. I can speak for CrateDB (I work there), that it definitely would be able to handle 600GB and query across it in an ad-hoc manner.

We have users ingesting Terabytes of events per day and run aggregations across 100 Terabyte.

1 comments

What kind of hardware requirements would be needed to store and query this much data?
- Depends - Just inserting, indexing, storing and simple querying can be done with little memory (i.e. 1:500 memory-disk-ratio 0.5GB RAM per 1TB disk). Typical production clusters with high query load are in the 1:150 range i.e. 64GB RAM for 10TB disk).

Otherwise typical general purpose hardware (Standard SSDs, 1:4 vCPU:memory ratios, ...)

Interesting, so that'd be about 1 vCPU and 4GB RAM per 625GB of data. That seems very price efficient. Would something like AWS's EBS be sufficient for this? Would you need one of the higher tiers? Or would you be looking at running this on a box with locally attached storage?
Most of CrateDB clusters run on cloud providers hardware (azure, aws, alibaba). Using EBS (GP2 or now GP3) is also quite common. Due to the indexing / storage engine, gp disks are typically sufficient and faster disks have little to no advantage
Wouldn't 0.5GB RAM per 1TB disk be more like 1:2000 memory-disk-ratio? Which is even better!
Sorry, mixed up the number 2GB memory (0.5GB heap). So 1:500 is correct