Hacker News new | ask | show | jobs
by wpeterson 872 days ago
If they’re optimizing full table scans of 20M+ rows, they probably want an optimized column oriented DB or a data warehousing option like Snowflake.
4 comments

agreed! PostgreSQL is the wrong tool for large data without indexes. It's also the wrong tool for ultra low latency access. And small, non-persistent data. And and and...

That said, over time PostgreSQL has wildly expanded the range for which it's suitable and if you can and want to bet on the future, it's often a better bet than niche systems.

It's also important to remember that PostgreSQL is decades ahead of other systems in data virtualization and providing backward-compatibility to applications after changes; pushing down computation near the data and avoiding moving billions of rows into middleware, including a world class query optimizer; concurrent data access; data safety and recovery; and data management and reorganization, including transactional DDL. Leaving this behind feels like returning to the stone age.

>> Wrong tool for ultra low latency access

I'm not sure what you mean by ultra low latency but unfortunate to have to rethink what a tool is good for because of RDS / EBS.

And even if you want to stay in the Postgres ecosystem there's options for you there.
For analytics, use a columnar database.

There are even other AWS Postgres-oriented options (check the pricing first):

ZeroETL from Aurora Postgres to (postgres-compatible) Redshift (Serverless?)

Yup. Even gross abuses of Redshift run fine with appropriate roll ups and caching. At a past job we did it “wrong enough” that it took a while for a more state of the art solution to catch up. This is not to say the abuse of Redshift should have been done, but AWS has been abused a lot and the engineers there have found a lot of optimizations for interesting workloads.

But to pick the wrong DB tool in the first place and bemoan it as “not scalable” is a bit like complaining that S3 made for a poor CDN without looking at how you’re supposed to use it with Cloudfront.

Is ZeroETL not in early stages, still? I heard it replicates everything. No filtering yet on parts of the binlog (tables/columns). But other than that, i like the idea.

(I would like to know, where their ZeroETL originated from, usually AWS picks up ideas somewhere and makes it work for their offerings to cash in. A universal replication tool.)

For a mere 20M-70M rows I'd stick with Postgres, index, and materialized view.

After that is when I'll start migrate to duckdb or clickhouse (or citus if I don't want to move out completely from Postgres)

Came here to say this. If you use a hammer to fasten a screw, it's probably not going to work
Perhaps cockroachdb or titaniumdb would be a better choice.
I can’t tell if you’re trolling or not, as those are even more terrible options for analytics workloads . You must be