Hacker News new | ask | show | jobs
by hodgesrm 2953 days ago
> [2] When I say Postgres is great for analytics, I mean it. In case you don’t know about TimescaleDB, it’s a wrapper on top of PostgreSQL that allows you to INSERT 1 million records per second, 100+ billion rows per server. Crazy stuff. No wonder why Amazon chose PostgreSQL as its base for Redshift.

Correction: Amazon chose ParAccel, which was a data warehouse forked from PostgreSQL.

Many data warehouse products have followed this path due to licensing. MySQL is GPLv2 which means you can't ship derivative works without releasing your code. PostgreSQL has a permissive license similar to MIT/BSD. You can do anything you want with the code. That's still a major consideration which the article omitted.

(Cross-posted from another HN link to same article.)

2 comments

Also, inserting directly into Redshift is strongly discouraged as it's extremely non-performant.

>An anti-pattern is to insert data directly into Amazon Redshift, with single record inserts or the use of a multi-value INSERT statement, which allows up to 16 MB of data to be inserted at one time. These are leader node–based operations, and can create significant performance bottlenecks by maxing out the leader node network as data is distributed by the leader to the compute nodes.

https://aws.amazon.com/blogs/big-data/top-10-performance-tun...

Postgresql is good for analytics, but it doesn't scale really well with a lot of data. I have moved my analytics to Clickhouse, 1000x better performance.