Hacker News new | ask | show | jobs
by micmil 2599 days ago
I'm not a database guy so have no clue, but why are there so many versions receiving support? Is there just that much legacy crap they can't get away from, like Python?
10 comments

People are slow to upgrade database systems, as it can take a log of regression testing to make absolutely sure your applications don't rely on unsupported/undocumented/undefined behaviours that make them compatible with the newest release (or are affected by officially acknowledged breaking changes). Especially in enterprise systems. Even if developers upgrade quickly, their clients with on-prem installations may not. That means that to be taken seriously you need to support your major and minor releases for some time to be accepted as a serious option in some arenas.

Supporting five versions is no more than MS do: currently SQL Server versions 2017, 2016sp2, 2016sp1, 2014sp3, 2014sp2, 2012sp4, 2008R2sp2 and 2008sp3. 2008sp3, 2008R2sp2, and 2016sp1 will hit their final EOL in a couple of months taking SQL Servers's supported list back down to 5 too.

I expect other significant DB maintainers have similar support life-time requirements for much the same reasons, though I'll leave researching who does[n't] as an exercise for the reader.

2008 and R2 are still in a supported phase of life. It's the "exorbitant support fee" phase. Nevertheless, you can still get Microsoft support for the two after the "EOL". It's more an end-of-public life
Aye, and by the same technicality you can still get support for 2005.

Similar with PG I assume. You could always pay someone an expensive contracting fee to support your use of an older version than is publicly supported.

With databases being often mission critical, the PostgreSQL people decided heroically to support major versions for 5 years -- and as they come out with a new major version every year, minor updates come out for 5 different branches.

Note the recent versioning change: 9.4, 9.5, 9.6 were the previous 3 major versions bases, and the last two are 10 and 11.

1) It's stateful, so upgrades also have to upgrade the state (MBs, GBs, TBs of data)

2) It's horrifically high risk because downgrading is usually not a thing

3) It usually requires downtime.

Moving between major versions of Postgres requires downtime proportional to the size of the database. Supporting older versions allows users to go many years without having to do this.
I upgraded from 9.3 → 11.2 a few months ago using pg_upgrade[1], on a master+slave database with 150GB of data. I did a fair amount of testing, but the final procedure was very fast and smooth.

1. Test the upgrade: set up an additional secondary (9.3), break the replication link (promote it to a master). Test the upgrade on that. It was really fast, under 30 seconds to shut down the old DB, run the in-place upgrade, and start up the new DB.

2a. In production: set up an additional secondary (9.3). Make the primary read-only. Promote the new secondary to a master. Shut down, upgrade to 11.2, restart. Point applications at it.

2b. Backout plan: leave the applications pointing at the original database server, make it read-write.

There are other options, including with only seconds of downtime, but <1 minute with pg_upgrade was simple and very acceptable for us.

[1] https://www.postgresql.org/docs/current/pgupgrade.html

[2] https://www.postgresql.org/docs/current/upgrading.html

This is a nice way to do that, but you have a low volume of data, and you think 30 seconds is fast and 1 minute of downtime is acceptable. I question these assumptions.

Consider the situation when you're adding thousands of new records per seconds, and the database is being used every second (quite literally: to compute per seconds statistics).

A better solution is to have triggers on the old master, to do the same inserts on the new master (after copying the data/promoting a replica/whatever), and have similar triggers on the new master when the IP is not the old master (to be able to backout to the old server)

Then both the new and the old master run "in parallel", with the same data, and you can have the apps use the new server (on a new domain name, new ip, new port, whatever) when you want - on a app by app basis if you want. You can keep both until you decide to decommission the old master.

That's a good plan for someone with a larger or busier database, but my approach was the correct solution for us. You can be satisfied that your taxes aren't wasted by us implementing anything more complicated :-)

(Most of the writes to our PostgreSQL server are batch processes, decoupled with a message broker, and the message consumers were easily paused during the upgrade.)

Not everyone is google, and hec. even google an their services are down sometimes.

If there's not too much data, it seems like the safest and fastest option.

Postgres users actually generally upgrade faster than those using other databases because there are a lot of new features each year. But once your database gets huge then upgrading still becomes a pain, so that's why they keep providing security support and bug fixes for older versions as well.
pg_upgrade with the --link option is extremely fast and doesn't really depend on the size of the database.
Interesting. I just use RDS and it always seems to take 15+ min even though our database is tiny.
Well supported older releases of the database engine, with clearly defined migration documentation and technology -- are the hallmark of successful Open source software ecosystem.

Because it mirrors and supports the reality of the business world.

Every large or small organization that manages their business, every year make 'Grow/Invest', 'Maintain', 'Disinvest' decision for each of the product/service lines.

Does not matter if is software, or making kielbasa. Postgres is exceptional, and is supporting the first 2.

With semantic versioning each time the major version changes it signifies a breaking change. If you have an application that breaks from one of those breaking changes you may not see it as a business opportunity to update because it “works” as it is. However, minor version changes can include anything that doesn’t break. So security patches are hopefully added to any major version that is officially supported.
PostgreSQL versioning is similar to semantic versioning, but doesn't follow it precisely. Major versions require a dump and restore (or other transform, such as an upgrade) of the on-disk data. Minor versions are fixes. Prior to PostgreSQL 10, the changes in the second numeric place are considered "major" versions. So, the past 5 major versions are 11, 10, 9.6, 9.5, and 9.4. The most recent versions of each of those are respectively 11.3, 10.8, 9.6.13, 9.5.17, and 9.4.22.
It's not "legacy crap". There just is long-term-support for versions cause it's not that easy to upgrade (both technically & others).

There's legacy crap everywhere, all langs,db,versions etc. Supported sometimes for 10+ years.

why did you have to bring Python into this? Every language used widely will have "legacy" crap.
not broken, don't fix
There are some shockingly old releases of PostgreSQL still in production for this reason.

Security updates should push the upgrade path a little harder, but there are still cases where a database can be completely isolated from the network and that might not even matter.

I inherited a production system with a PostgreSQL 8.1 database. It's one of the most reliable systems I have.