Hacker News new | ask | show | jobs
by zhousun 463 days ago
Hi, Zhou From Mooncake labs here.

Love your work on PeerDB and it's inspiring the evolvement of pg_mooncake (logical replication will be the killing feature for V2)

The core idea of mooncake is to built upon open columnar format + substitutable vectorized engine, while natively integrate with Postgres:

1. For small devs, we allow the whole stack to be embedded as a Postgres extension for ease of use

2. For enterprise, our stack is also purpose-built stack similar to PeerDB + ClickHouse, not a more generalized approach

We allow a gradual transition from 1 to 2.

1 comments

Thank you for the kind words! :)

1, makes sense.

On 2, I understand your thinking around purpose-built — but you're retrofitting an analytical database into a transactional database without fully supporting all the features (both in terms of functionality and performance) of either. It's really hard to be truly purpose-built this way. As a result, users might not get the best of both worlds.

PeerDB is different. We keep Postgres and ClickHouse separate and just move data reliably between them. Users get to query Postgres and ClickHouse in isolation and make the best of each of them.

Anyway, keep up the good work! Just wanted to share some challenges we've seen before when building an analytics extension (Citus), particularly around chasing both Postgres compatibility and performance.

Yep what I want say is the line between the two designs is indeed very blur.

Logical replication with mooncake will try to create a columnar version of a postgres heap table, that can be readable within postgres (using pg_mooncake); or outside postgres (similar to peerdb + clickhouse) with other engines like duckdb, StarRocks,Trino and possibly ClickHouse.

But since we can purposely build the columnstore storage engine to have postgres CDC in mind, we can replicate real-time updates/deletes(especially in cases traditional OLAP system won't keep up).

I understand. In that scenario, why can't users just use these other query engines directly instead of the extension. You're heavily relying on DuckDB within your extension but may not be able to unleash its full power since you're embedding it within Postgres and operating within the constraints of the Postgres extension framework and interface.
lol spot-on comment and stay tuned for our v2 :)

The focus of mooncake is to be a columnar storage engine, that natively integrate with pg, allowing writing from pg, replicating from pg, and reading by pg using pg_mooncake. We want people to use other engine to read from mooncake, and here they are effectively stateless engine, that's much easier to manage and avoids all data ETL problems.

Sounds good. I'm still a bit confused. But will wait for your next version. :) ETL problems still aren't avoided — replicating from Postgres sources using logical replication is still ETL. One topic we didn't chat much is, be careful about what you're signing up for with logical replication — we built an entire company just to solve the logical replication/decoding problem. ;)