Hacker News new | ask | show | jobs
by jurgenkesker 906 days ago
Wondering why nothing is said about compression? For MySQL you have Rocksdb with high ZSTD compression. Is the Postgres solution to just use compressed filesystem like ZFS? I could not find much info when researching Postgress, except for Toast compression, which didn't seem very strong.

I have a huge DB which would take many TB but now runs on a 500GB footprint. Thanks to Rocksdb/MySQL. Pondering a move to Postgress, but seemed like a step down in this regard.

4 comments

This is what keeps me from postgres for large amounts of data. I use postgres to store metadata or small data (ie: "users logged in currently", "customers", etc).

If you want large datasets (petabytes) you really need to look elsewhere to something with better compression support or that tiers its data off to S3.

It's possible that with bcachefs we're like a decade away from "good fs in mainline kernel with fs compression" but right now it's not a great situation.

I'm not familiar with Rocksdb but if it's just a KV store you want, I'd probably just use Redis with compression. It is hard to beat.

Otherwise, Timescaledb offers compressible table chunks using table inheritance. It's really pretty slick, depending on your use-case.

RocksDB is a fork of Google's LevelDB, a KV store using Log Structured Merge Tree (LSM tree) which is great for high write workloads. The MySQL storage engine that uses RocksDB is called MyRocks[0].

[0] http://myrocks.io/

YugabyteDB is a distributed Postgres and their storage layer is called DocDB, a slightly modified RocksDB.
I think bringing RocksDB into Postgres as a Table access method would really be a powerful feature, as an alternative to heap
I think the page features patch (which is a precursor to TDE, which Jonathan does mention) likely paves the way for better compression options.
so, why wouldn't you use btrfs or zfs?
Because compression is robbing Peter to pay Paul. It saves money on storage, but costs money in compute.

And a lot of data is already compressed (images, etc) so might increase your compute cost without any storage savings.

Filesystems are often the wrong level in the stack to do compression. You want individual control over what data is compressed.

> so might increase your compute cost without any storage savings.

I agree with this point, but I think those online compression algorithms add very little overhead, so it is mostly about if Postgres adds enough other benefits compared to your current DB.

The bigger issue potentially is that ext4 is faster than zfs and btrfs itself regardless of compression.

The compression penalty is actually massive. The bytes you're trying to read might require reading other bytes elsewhere on the disk in order to decompress them.

It's not just compute power, I was generalising with that statement. The real world speed hit can be as bad as read speeds an order of magnitude slower than if you disable compression.

You really only want to compress data where it makes sense to compress it.

> The bytes you're trying to read might require reading other bytes elsewhere on the disk in order to decompress them.

data is compressed and decompressed in blocks..