| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by philbe77 457 days ago

Thanks for the kind words!

I think the main advantages this has over other RDBMS's such as Postgres and MySQL is performance - due to the columnar / vector processing capabilities of DuckDB. I've run this in AWS with a i8g.16xlarge, and on Azure with a Standard_E64pds_v6 - and get amazing performance - due to the use of NVMe storage, lots of CPU (64) and memory (512GB) - for less than $4/hr in cloud VM cost.

This solution lets users use the large resources available on larger cloud VMs as an enterprise-grade server - for use with data notebooks, analytics dashboards, and more.

You can get performance that meets and exceeds many distributed systems on a much simpler architecture - reading from parquet datasets, assuming you've copied (cached) them on the local NVMe SSD storage. Of course - this has disadvantages - such as keeping the local copy in sync with the cloud storage (S3) - but it can be well worth it if you have a mostly read-only dataset (write-once, read-many). There is a small startup time penalty for hydrating the local storage - but I've seen throughput as high as 4GB/s for this initial copy...

I hope this helps. Thanks again for the question!