Hacker News new | ask | show | jobs
by bigtimegames 1871 days ago
Sure why not -- I mean redshift is a cluster of Postgres databases as well. What you don't get is scale -- so its fine for a small data warehouse (in which case is it really a data warehouse...)
1 comments

I am not sure I agree with the general idea that Postgres can't or even--albeit a bit less strongly--that it is hard to scale. Even in 2008 people were running petabyte-scale warehouses using Postgres:

https://www.toolbox.com/tech/data-management/blogs/2-petabyt...

Since 2008 improvements in parallel query execution (and numerous other improvements) in the core project plus the availability of forks/extensions which abstract and/or modify various bits for improving scalability (see Citus and Timescale) it's never been easier to scale Postgres to some truly staggering heights.

While I wouldn't want to speak in absolutes, there are very few applications where I think Postgres wouldn't be a viable choice as a data warehouse.

Emphasis on warehouse as I wouldn't want to suggest Postgres as an ideal candidate to be a data lake. The difference between them for me being whether or not the data is structured/processed. Similar in definition to this article:

https://medium.com/@distillerytech/data-warehouse-vs-data-la...

Personally, I have experience scaling core PostgreSQL (9.4) to handle ingestion of monitoring data for web servers to the tune of 2-3 terabytes a day. Not the grandest of scales, but enough to have seen a few bumps along the way...and, for what it's worth, I think it is surprisingly easy to scale.

I wouldn't want to sign up to scale Postgres to handle exabyte data loads, but single digit petabytes? Sure.

https://techcommunity.microsoft.com/t5/azure-database-for-po...

And at petabyte-scale, I personally think it qualifies as a data warehouse.