| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wpeterson 872 days ago
	If they’re optimizing full table scans of 20M+ rows, they probably want an optimized column oriented DB or a data warehousing option like Snowflake.

4 comments

asah 871 days ago

agreed! PostgreSQL is the wrong tool for large data without indexes. It's also the wrong tool for ultra low latency access. And small, non-persistent data. And and and...

That said, over time PostgreSQL has wildly expanded the range for which it's suitable and if you can and want to bet on the future, it's often a better bet than niche systems.

It's also important to remember that PostgreSQL is decades ahead of other systems in data virtualization and providing backward-compatibility to applications after changes; pushing down computation near the data and avoiding moving billions of rows into middleware, including a world class query optimizer; concurrent data access; data safety and recovery; and data management and reorganization, including transactional DDL. Leaving this behind feels like returning to the stone age.

link

osigurdson 870 days ago

>> Wrong tool for ultra low latency access

I'm not sure what you mean by ultra low latency but unfortunate to have to rethink what a tool is good for because of RDS / EBS.

link

spamizbad 872 days ago

And even if you want to stay in the Postgres ecosystem there's options for you there.

link

gregw2 872 days ago

For analytics, use a columnar database.

There are even other AWS Postgres-oriented options (check the pricing first):

ZeroETL from Aurora Postgres to (postgres-compatible) Redshift (Serverless?)

link

Moto7451 872 days ago

Yup. Even gross abuses of Redshift run fine with appropriate roll ups and caching. At a past job we did it “wrong enough” that it took a while for a more state of the art solution to catch up. This is not to say the abuse of Redshift should have been done, but AWS has been abused a lot and the engineers there have found a lot of optimizations for interesting workloads.

But to pick the wrong DB tool in the first place and bemoan it as “not scalable” is a bit like complaining that S3 made for a poor CDN without looking at how you’re supposed to use it with Cloudfront.

link

okr 869 days ago

Is ZeroETL not in early stages, still? I heard it replicates everything. No filtering yet on parts of the binlog (tables/columns). But other than that, i like the idea.

(I would like to know, where their ZeroETL originated from, usually AWS picks up ideas somewhere and makes it work for their offerings to cash in. A universal replication tool.)

link

ies7 871 days ago

For a mere 20M-70M rows I'd stick with Postgres, index, and materialized view.

After that is when I'll start migrate to duckdb or clickhouse (or citus if I don't want to move out completely from Postgres)

link

imheretolearn 872 days ago

Came here to say this. If you use a hammer to fasten a screw, it's probably not going to work

link

hobobaggins 871 days ago

Perhaps cockroachdb or titaniumdb would be a better choice.

link

dalyons 871 days ago

I can’t tell if you’re trolling or not, as those are even more terrible options for analytics workloads . You must be

link