Hacker News new | ask | show | jobs
by matthewbauer 1859 days ago
Postgres is one of those pieces of software that’s so much better than anything else, it’s really incredible. I wonder if it’s even possible for competitors to catch up at this point - there’s not a lot of room for improvement in architecture of relational databases any more. I’m starting to think that Postgres is going to be with us for decades maybe even centuries.

Do any other entrenched software projects come to mind? The only thing comparable I can think of are Git and Linux.

14 comments

There's a ton of room for improvement in the architecture of relational databases. This isn't a dig against Postgres, or ignoring how difficult it will be to get a new system to the same level of maturity. But databases designed natively for cloud/clustering, SSDs, (pmem soon perhaps), etc are quite a bit different. There's enormous simplifications and performance gains possible.

There's been a lot of exciting work in this area over the last decade or so. Andy Pavlo's classes are great surveys of the latest work: https://15721.courses.cs.cmu.edu/spring2020/

CosmosDB is an example of a relational (multi paradigm properly) database with a quite different architecture vs the classic design, that's moved into production status quite rapidly.

FaunaDB and CockroachDB are moving with solid momentum too.

Yeah, to list a bit:

- scaling is non-trivial (you can't just add a node and have PostgreSQL automagically Do The Right Thing™)

- you can only have so many connections open to the database, causing issues with things such as AWS Lambda

- I don't remember if this was changed, but I got the impression a while ago that having dynamic DB users was a bit cumbersome to set up (plug PostgreSQL to AD/LDAP)

An external connection pooler like pgbouncer can alleviate some of the simultaneous connection limits

There are projects to automate the syncing of LDAP users to postgres but it would be nice if this was built-in.

However I get the impression that part of the reason these features aren't in the box is to limit scope creep in the main project.

On the lambda point, RDS proxy is a good solution if using AWS.
You must be kidding me with the CosmosDB mention. It doesn't even have foreign key constraints. I have to work with it and I have never seen such a feature-poor dbms before.
I'm talking about the general category of everything built atop BW-Tree and the Deuteronomy architecture. Murat Demirbas's blog has nice summaries of the papers. CosmosDB is a brand that encompasses more than one database engine, but I used that term as most people aren't familiar with the literature on the topic.
Foreign key constraints are not practical for distributed data stores. And are actually more controversial than you’d think for regular databases, due to their heavy performance cost.
There are also technologies like NVMe over Fabric/RDMA, eBPF, XDP, io_uring etc which are just starting to get traction and are game changers for performance. None of which are being used.

All of these require a different architecture so expect to see newer databases push things even further.

Peope are working on io_uring for PostgreSQL... watch this space... https://github.com/anarazel/postgres/tree/aio
That sounds extremely interesting, would be nice to have more details on this!
wow this looks incredible!
Cockroach is the worst brand for a database ever.

Even Croach would be a massive branding improvement.

This is similar to how gimp is a terrible brand.

It’s no coincidence - the names for Cockroach and GIMP were coined by the same person https://en.m.wikipedia.org/wiki/Spencer_Kimball_(computer_pr...
Well that explains a lot. Imagine the success of all his work had it had better branding.
Let's add Git and Kafka to that list.
I mean... the WORST? For me Mongo takes the cake, but oracle is up there too.
Really? Oracle actually makes a lot of sense to me for a database name (in the 'source of truth' sense, not in the prophet sense).

Mongo, on the other hand, has definitely always had the racist/ablist slur as the first connotation for me.

I've learned almost all the slurs I know from comments or media sources complaining about them. It's the only place they're used in polite society.
It isn't really a surprise in this case since both ethnic Mongolians and those with Downs syndrome are not in many Americans' social circles.

"Almost all" does sound like a bit of a surprise, but thinking back on it the only one I know for a fact I heard for the first time outside of a corrective context was my elder uncles friends who enjoyed self-depreciating jokes, usually with slurs for eastern Europeans in them. I first heard those as a child and only realized years later they were offensive. Most others, I think, I honestly have no idea when i was first exposed to them.

This says more about you than the baseline.
Then perhaps count yourself lucky to not have had some of these used against you.
I always thought it was a reference to valuable stuff picked from trash, which I understood to be slang from sanitation workers, but apparently that's local to the NYC area.
I don't know about other languages, but in German "Mongo" is pretty much a forbidden word as it is derogative descriptor for people with down syndrom and other visible defects, especially movement defects.
I don't know about other languages, but in German "Mongo" is pretty much a forbidden word as it is derogative descriptor for people with down syndrom and other visible defects, especially movement defects.

In the UK that would be "mong", for us Mongo is the planet Ming The Merciless is from.

Lecture 1 of that series is surprising and hilarious for a class about databases.
I am not a student at CMU. Are these publicly available online?
Click schedule in the link in my above comment, or in any of the previous classes on the same topic. It's all online. They only restrict a handful of guest lectures, usually from the usual suspects like oracle or amazon.
I'm an enormous fan of Postgres, it's my default go-to RDBMS. But the memory expense of connections is a huge issue and this article doesn't convince me that it's solved.

The machine being used for this benchmark has 96 vCPUs, 192G of RAM, and costs $3k/mo.

My business runs just fine on a 3.75G, 1 vCPU instance. But idle connections eat up a huge amount of RAM and I sometimes find myself hitting the limits when a load spike spins up extra frontend instances.

Sure I could probably setup pgbouncer and some other tools but that's a lot of headache. I'm acutely aware that MySQL (which I dislike because no transactional DDL) does not suffer from this issue. I also don't see this being solved without a major rewrite, which seems unlikely.

So Postgres has at least one very serious fault that makes room in the marketplace. The poor replication story is another.

It isn't solved, and no one claimed it to be solved. The scalability improvement is related to how we build MVCC snapshots (i.e. information which transactions are visible to a session). That may reduce the memory usage a bit, but it's more about CPU I think.

As for the per-connection memory usage, the big question is whether there really is a problem (and perhaps if there's a reasonable workaround). It's not quite clear to me why you think the issues in your case are are due to idle connections, but OK.

There are two things to consider:

1) The fixed per-connection memory (tracking state, locks, ..., a couple kBs or so). You'll pay this even for unused connections.

2) Per-process memory (each connection is handled by a separate thread).

It's difficult to significantly reduce (1) because that state would no matter what the architecture is, mostly. Dealing with (2) would probably require abandoning the current architecture (process per connection) and switching to threads. IMO that's unlikely to happen, because:

(a) the process isolation actually a nice thing from the developer perspective (less locking, fewer data races, ...)

(b) processes work quite fine for reasonable number of long-lived connections, and for connection pools address a lot of the other cases

(c) PostgreSQL supports a lot of platforms, some of which may not may not have very good multi-threading support (and supporting both architectures would be quite a burden)

But that's just my assessment, of course.

I wonder if the amount of RAM used by a new process can be reduced. Code and other RO segments are shared anyway, so it's only basically the new heap and various buffers.

Reducing this amount would also run Postgres in more constrained environments.

There are two parts of this - the memory allocated by OS and internally.

At the OS level, we can't really do much, I'm afraid :-( I don't think we're wasting too much memory there, exactly because a lot of the memory is shared between processes. Which also makes it difficult to determine how much memory is actually used by the processes (the sharing makes the various metrics in ps/top are rather tricky to interpret).

As for the internal memory, it's a bit more complicated. We need a little bit of "per process" memory (per-backend entries in various internal data structures, etc.) - a couple dozen/hundred kBs, perhaps. It's hard to give a clear figure, because it depends on max_locks_per_transaction etc. This is unlikely to go away even if we switched to threads, because it's really "per session" state.

But then there are the various caches the processes keep, memory used to run queries etc. Those may be arbitrarily large, of course. The caches (with metadata about relations, indexes etc.) are usually a couple MBs at most, but yes, we might share them between threads and save some of this memory. The price for that would be the need for additional synchronization / locking, etc. The memory used to run queries (i.e. work_mem) is impossible to share between threads, of course.

There's a blog post by Andres Freund with more details: https://www.citusdata.com/blog/2020/10/08/analyzing-connecti...

Overall, there's very little chance PostgreSQL switch to threads (difficulty of such project, various drawbacks, ...). But I do agree having to run a separate connection pool may be cumbersome, etc. There was a proposal to implement integrated connection pool, which would address at least some of those problems, and I wouldn't be surprised if it happened in foreseeable future.

And this right here is why PostgreSQL will never overtake MySQL and its forks. The entire industry is sick of these excuses regarding process-per-client instead of a proper multi-threaded model. There may have been a valid argument for this 15 years ago, but not anymore.

Your definition of "reasonable number of long-lived connections" is anything but reasonable. Then "connection pools address a lot of the other cases", when a connection pool/bouncer is unwanted, unwarranted, and just adds another point of failure that needs to be deployed and maintained.

I disagree, for a number of reasons.

Firstly, it's not the goal of the PostgreSQL project to overtake MySQL or other databases, but to serve the existing/new users. This also means we're investing the development effort in a the highest benefit / effort ratio. Even if switching from process-based to thread-based model improved the per-connection overhead, the amount of work needed is so huge the benefit / effort ratio is so utterly awful no one is going to do it. There are always better ways to invest the time / effort. Especially when there are practical solution / workarounds like connection pools.

Secondly, every architecture has pros/cons, and switching from processes to threads might help in this respect but there are other consequences where the process model is superior (some of which were already mentioned). Focusing on just this particular bit while ignoring the other trade-offs is rather misleading.

And no, the arguments did not really disappear. To some extent this is about the programming model (locking etc.), and that did not really change over time. Also, PostgreSQL supports platforms, some of which may not have particularly great threading support.

I'm not claiming there are no workloads / systems that actually need that many long-lived connections without a connection pool. In my experience it's usually "We don't want to change the app, you have to change the DB!" but fine - then maybe PostgreSQL is not the right match for that application.

> Even if switching from process-based to thread-based model improved the per-connection overhead, the amount of work needed is so huge the benefit / effort ratio is so utterly awful no one is going to do it.

Then other products will emerge and overtake some of PostreSQL's marketshare in the long run. It's already happening in fact. Just like more efficient and easier to configure webservers like nginx and caddy are gaining marketshare over Apache httpd.

I love PostgreSQL and don't want to see it becoming the next Apache httpd, slowly but surely fading. Perhaps FAANGs could fund such refactor.

Perhaps a cheaper solution was to incorporate pgBouncer inside PostgreSQL so it would naturally sit in front of PostreSQL in the default installation without extra configuration.

> Then other products will emerge and overtake some of PostreSQL's marketshare in the long run. It's already happening in fact. Just like more efficient and easier to configure webservers like nginx and caddy are gaining marketshare over Apache httpd.

Maybe, we'll see.

It however assumes the other (thread-based) architecture is somewhat universally better, and I doubt that's how it works. It might help the workloads actually requiring many connections to some extent, but it's also likely to hurt other workloads for which the current architecture works just fine.

But let's assume we decide to do that - such change would be a massive shift in programming paradigm (both internally and for extensions developed by 3rd parties) and would probably require multiple years. That's a huge investment of time/effort, with a lot of complexity, risks and very limited benefits until it's done. I'd bet there'll always be a feature with better cost/benefit ratio.

So reworking the other architecture might actually gain us some users but loose others, and drain insane amount of development resources.

> Perhaps a cheaper solution was to incorporate pgBouncer inside PostgreSQL so it would naturally sit in front of PostreSQL in the default installation without extra configuration.

Yes, I already mentioned that's quite likely to happen. There has already been a patch / project to do exactly that, but it didn't make it into PG14.

Setting up pgbouncer is not much headache and for for OLTP workloads, it works great. You can even see it in the graph, that best performance is when number of CPU cores = number of connections. And so will be memory use. :)
You may be right that it's easy to set up, but pgbouncer doesn't help with this problem most of the time. It's a problem that needs to be solved within postgres.

There are three pooling modes:

- Session pooling. Doesn't help with this issue since it doesn't reduce the total number of required connections.

- Transaction pooling / statement pooling. Breaks too many things to be usable. (eg. prepared statements...)

See the table at https://www.pgbouncer.org/features.html for what features cannot be used with transaction pooling.

Personally I don't expect this to be ever improved in PostgreSQL (ie. change from process per connection model to something else), so I design my multi-user apps so that everything works fine with session pooling (quick short sessions/connections to pgbouncer) and connections that wait for NOTIFY get made directly to postgresql server, and are also limited in number.

And it works fine on low-resourced machines that I tend to use for everything.

Switching to a threaded model would be a lot of work, but there is a simpler solution that retains most of the benefits: using a process-per-connection model for active connections only, and allowing a single process to have multiple idle connections.

I follow the mailing list because I'm interested in this exact issue. Konstantin Knizhnik sent a patch implementing a built-in connection pooler in early 2019 that uses a similar approach to what I just described. The work on that has continued to this day, and I'm hopeful that it will eventually be merged.

But how's that different from pgbouncer?

EDIT: I see, it would have session state restore, not just DISCARD like pgbouncer.

I agree - the disparity between the cost of idle connections in Postgres vs MSSQL is hampering our ability to migrate.
Can you elaborate / quantify the memory requirements a bit? I don't have much experience with MSQQL in this respect, so I'm curious how big the difference is.
Sure, SQL Server supports a maximum of 32767 connections each of which use around 128kB. Meaning that if you use the max connections you’ll need 4GB for the connection overhead.

We see no noticeable drop in performance with increased idle connection with our workload.

Why are you migrating out of curiosity? Price reasons?
Yes, we have multiple RDS instances and wish to reduce costs.
Out of curiosity, do you know what causes this?
They mention this in the article. But to sum up, each connection in PG is handled by its own OS process. Postgres behind the scenes is composed by multiple single-threaded applications.

This comes with the advantes for Pg developers (and us!) that they don't need to deal with tons of data races issues, but the trade off is that memory wise, a process takes way more memory than a thread.

Say more about the "poor replication story". I thought replication was pretty good. What's wrong with it?
There's some stuff here with some links you can follow: https://rbranson.medium.com/10-things-i-hate-about-postgresq...
> Do any other entrenched software projects come to mind?

Elasticsearch is underrated here, IMO. Yes, there are alternatives for simple fulltext search. But there’s a lot more it can do (adhoc aggregations incorporating complex fulltext searches, with custom scripted components; geospatial; index lifecycle management) and if you’re using those features, there’s nothing else comparable.

It’s pretty stable, too, once you’ve got the cluster configured. We don’t have outages due to problems with Elasticsearch.

To provide an opposing viewpoint here: ES and it’s monstrous API and resourcing requirements are a pain to manage and run. It’s a product that has pivoted in so many directions that it’s just become a bit of a mess. I don’t want a full-text search engine that also has graphs, ML, some bizarre scripting feature, log management, etc all stapled in on top. Geospatial and other analytic stuff I’d rather use a dedicated OLAP db like Redshift or ClickHouse.

I’m currently evaluating typesense vs ES for a fts project and typesense is winning so far by simply be “not painful” to deal with.

> I don’t want a full-text search engine that also has graphs, ML, some bizarre scripting feature, log management, etc all stapled in on top

Sure, so use something else. I do need (most all of) that at my work (plus the horizontal scaling), and there's no competition. I know we're not the only ones.

Also, there's nothing bizarre about the scripting feature. There are several options for scripting, it's very flexible, and it suits implementing custom logic when you need it.

And, I'm not saying ES is perfect! I'm saying that there's a set of use-cases that only ES (to my knowledge) can fulfil, and that's complex aggregations also involving complex full-text search, over tera/petabytes of data. Clickhouse can do aggregations, but doesn't have anything close to the search chops (again, to my knowledge).

I don't know about elasticsearch specifically, but I'm skeptical of special-purpose systems for databases.

They are great in some cases and terrible in others, and over time, use cases push database systems into their worst cases. Use cases rarely stay in the sweet spot of a special-purpose system.

That being said, if the integration is great, and/or the special system is a secondary one (fed from a general-purpose system), then it's often fine.

I’m not sure I fully understand your comment (databases that are special-purpose and evolve out of a sweet spot, or special-purpose systems using databases in worst-case ways?).

I certainly wouldn’t say ES is the former. We use it for some conplex things that (AFAIK) no other (publicly available; I don’t what eg Twitter or Google has going on) system could provide at the scale we need. Everything we’re doing is well within the realm of what ES is built for, and it’s the only system built for it. It’s not perfect, but most of our performance issues could be solved by scaling out, where query or index optimization isn’t tractable.

I interpreted (misinterpreted?) your comment to be suggesting ES for wider use cases.
It’s frustrating to need a run-time team for a piece of infrastructure, especially one sold as IaaS.

It’s totally understandable that you’d need developers to have expertise in patterns and anti-patterns, as well as needing an expert to set things up in the first place, but you shouldn’t have to have a dedicated ES monitoring / tuning / babysitting team like Oracle DBAs of yore. That you do, means it isn’t there yet as a product.

ES doesn't need a "run-time team". It just works.
It absolutely does not “just work”, there’s so much to configure and then get-right for your use-case that you almost certainly need people with a solid understanding of the JVM + ES. Let alone fixing it when something inevitably breaks.
No more than any other database. I mean relative to SQL Server, Postgres, MongoDB or any other database. There's no extraordinary difficulty to manage ES above any other production system. It is very usable out of the box, and needs minimal tuning for many use cases. Of course some uses cases will require additional tuning and maintenance, sometimes quite a lot if you have a very large system, JUST LIKE ANY OTHER DATABASE SYSTEM.

In our case for a small website serving the general public (a few tens of thousands of requests per day) it just worked OOTB with hardly any tuning or maintenance at all.

Elasticsearch requires lots of hand holding if you have a cluster. Sounds like you're talking about a single instance.

Especially if an index goes down and you need to kick it to continue indexing.

> Do any other entrenched software projects come to mind?

SQLite.

I'm pretty hopeful that DuckDB will replace some of the use of SQLite. SQLite is great but it sucks that it's entirely dynamically typed (the types specified for columns are completely ignored).
> the types specified for columns are completely ignored

They aren’t constraints (except in the case of “INTEGER PRIMARY KEY”), but they also aren’t “completely ignored”, because of type affinity.

I like to say that "Postgres is a great default". It's generally very good, and also very adaptable to special purposes, so it covers a wide range of use cases.

But saying "so much better" is too strong.

Postgres is good, even great, but this is hyperbole. Postgres has its downsides, autovacuum being one of them.
Although the article doesn't mention it, index bloat will be far better controlled in Postgres 14:

https://www.postgresql.org/docs/devel/btree-implementation.h...

One benchmark involving a mix of queue-like inserts, updates, and deletes showed that it was practically 100% effective at controlling index bloat:

https://www.postgresql.org/message-id/CAGnEbogATZS1mWMVX8FzZ...

The Postgres 13 baseline for the benchmark/test case (actually HEAD before the patch was committed, but close enough to 13) showed that certain indexes grew by 20% - 60% over several hours. That went down to 0.5% growth over the same period. The index growth much more predictable in that it matches what you'd expect for this workload if you thought about it from first principles. In other words, you'd expect about the same low amount of index growth if you were using a traditional two-phase locking database that doesn't use MVCC at all.

Full disclosure: I am the author of this feature.

Wow, this is actually incredible. One of my biggest gripes with Postgres is going to be solved. Thank you for sending this over!
Thanks.

I forgot to mention that the test case had constant long-running transactions, each lasting 5 minutes. Over a 4 hour period for each tested configuration.

This level of improvement was possible by adding a relatively simple mechanism because the costs are incredibly nonlinear once you think about them holistically, and consider how things change over time. The general idea behind bottom-up index deletion is that we let the workload figure out what cleanup is required on its own, in an incremental fashion.

Another interesting detail is that there is synergy with the deduplication stuff -- again, very nonlinear behavior. Kind of organic, even. Deduplication was a feature that I coauthored with Anastasia Lubennikova that appeared in Postgres 13.

I am not very familiar with this topic, but need to maintain large and frequently updated DB, which requires periodic VACUUM FULL with full tables lock, so, does PG suffers from index bloat only and your fix solves it, or there is some other type of bloat for general table data too, which will still exists after your improvement?
It's not possible to give you a simple answer, especially not without a lot more information. Perhaps you can test your workload with postgres 14 beta 1, and report any issues that you encounter to one of the community mailing lists.
i think this should have been the headline. really thanks!
Thank you!!
I think many mercurial users would disagree with you about git.
Are we talking about market dominance, mind share or the idea that there's no real competition?

MySQL and Oracle exist. Mercurial and perforce exist. I'm not sure it's a terrible stretch to compare git and postures.

I think the point is that git isn't "so much better" than mercurial, while pgsql has had a lead on mysql for quite some time on a lot of technical measurements.
Postgresql does not have real, maintained with each change, clustered index. That itself makes it worse for many workloads than MySQL
I would say that that's pretty dubious claim with modern versions of Postgres and MySQL/InnoDB, running on modern hardware. See for example this recent comparative Benchmark from Mark Callaghan, a well known member of the MySQL community:

https://smalldatum.blogspot.com/2021/01/sysbench-postgres-vs...

I'm not claiming that this benchmark justifies the claim that Postgres broadly performs better than MySQL/InnoDB these days -- that would be highly simplistic. Just as it would be simplistic to claim that MySQL is clearly well ahead with OLTP stuff in some kind of broad and entrenched way. It's highly dependent on workload.

Note that Postgres really comes out ahead on a test called "update-index", which involves updates that modify indexed columns -- the write amplification is much worse on MySQL there. This is precisely the opposite of what most commentators would have predicted. Including (and perhaps even especially) Postgres community people.

"Is table a heap with indexes on the side or is table a tree with other indexes on the side (i.e. 'clustered index')" is a more complicated discussion.

The former makes it possible to have MVCC (and thus gives you snapshot isolation and serializability) and makes secondary indexes perform faster, at the cost of vacuum or Oracle-style redo/undo/rollback segments with associated "Snapshot too old" issues.

The latter pretty much forces use of locking even for read so queries block each other (but don't require vacuum or something), makes clustering key selective queries perform faster than secondary index ones and makes you think really hard about the clustering key.

It's not really a feature you would have, but a complicated design tradeoff.

MySQL 8 is not that far behind in feature parity. And is ahead when it comes to scalability. So I don't see postgres as necessarily standing alone.
No DDL transactions, no materialized views, the list is endless.

There's almost no reason to pick MySQL for a new project.

MySQL and mariadb have first class temporal tables. Pg has compile requirement and so cannot use in AWS RDS.
I was aware maria had temporary tables, but not mysql proper. Any links you can point me to? Every search is coming up with 'temporary' table info, not temporal.
> MySQL and mariadb have first class temporal tables. Pg has compile requirement and so cannot use in AWS RDS.

There’s a pl/pgsql reimplementation of temporal tables specifically for that use case.

mysql8 has gis/spatial stuff built in now. may not quite be on par with postgis, but... i also don't have to futz with "doesn't come baked in". Dealt with someone who wrote a whole bunch of lat/lon/spatial stuff in client code because we're on postgres but ... he couldn't get postgis installed (then even if he could, figuring out how to convince the ops people to add a new 'thing' in production would have been a delay).

having stuff baked in is often a win.

MySQL has transactions for DDL changes since 8.0.
MySQL has atomic ddl, which means if a ddl operation fails it is reverted. But PostgreSQL has really transactional ddl which means you can do ddl operations in a transaction and you can commit/rollback multiple ddl operations at once and not each by it‘s own like MySQL does.

https://dev.mysql.com/doc/refman/8.0/en/atomic-ddl.html

MySQL's lack of DDL transactions is a serious shortcoming.
You claim that MySQL 8 is ahead when it comes to scalability. What are the bases of this claim? When I see comparisons or entire systems that rely on a database (that is, not micro-benchmarks) such as the TechEmpower web framework benchmarks [0] , I notice that the 'Pg' results cluster near the top, with the "My" results showing up further down the rankings. I understand this isn't version 14 of the former versus version 8 of the latter. But it makes me wonder what the basis of your claims is.

[0] https://www.techempower.com/benchmarks/

Techempower is not a database benchmark. The tests that involve a DB exist to include a DB client in the request flow, not to put any serious load on the database.
Aren't those run on a single node DB server? And the queries don't really seem realistic at all, e.g. single query test fetches 1 out of 10 000 rows, with no joins at all. Fortunes fetches 1 out of 10 rows. This seems extremely trivial.
well if you need more than one server, mysql has vitess, which is huge. postgres has citus, but that is way more complex to setup than vitess.

I still would never use mysql, just because of vitess.

Are there any Roadmap for MySQL 9 ?
Fortran for linear algebra software.

Excel for business spreadsheets.

Java for enterprise server software.

> Fortran for linear algebra software.

Not an expert, but it is my understanding that Julia is becoming an ever more serious competitor day by day.

> Excel for business spreadsheets.

Honest question, what does LibreOffice miss compared to Excel? In any case, (again not an expert) spreadsheets seem quite inferior to a combination of Julia, CSV and Vega (Lite); although there are certainly more people that are familiar with operating Excel.

> Not an expert, but it is my understanding that Julia is becoming an ever more serious competitor day by day.

And Julia uses BLAS which is written in Fortan.

Not necessarily. All of the DifferentialEquations.jl defaults use pure Julia BLASes which outperform the Fortran BLASes. Mainly, RecursiveFactorization.jl and Octavian.jl, which tend to match or outperform MKL and OpenBLAS on our benchmarking computers, form our workhorse.

https://raw.githubusercontent.com/JuliaLinearAlgebra/Octavia...

https://github.com/JuliaLinearAlgebra/Octavian.jl

https://github.com/YingboMa/RecursiveFactorization.jl

Julia has a long way to get there, where Fortran is in terms of stability and maturity, needs approximately 60 years more.
What issue do you have with the use of RecursiveFactorization.jl in DifferentialEquations.jl? I can't think of a maturity issue so I'm curious what you have found, or whether this comment isn't grounded in specifics.
Did my original post even mention RecursiveFactorization.jl or DifferentialEquations.jl? (or is this an implicit package promotion? I do not have issues with them anyway, perhaps great work, have not used them...) Regarding your second point, let's see if the language and its various package APIs remain stable, actively maintained, and backward-compatible just ten years from now, let alone 3 quarters of a century. Such issues do not become visible right away or overnight. I am not against the language, just stating the fact that it has yet to pass the test of time.
> Java for enterprise server software.

Big corporations are horribly inefficient and Enterprise Software necessarily so from that...if you're saying Java is terrible by nature of it being the goto for enterprise, then that makes sense. It took 20 years for it to swap places with COBOL and I expect it will be something else in 20 more.

I don't work with Java, but I can think of a few advantages off the top of my head:

- appreciation of backwards-compatibility (here it wins with Python);

- great debuggers and performance tools (e.g. Java Flight Recorder or Eclipse Memory Analyzer);

- easy deployment - you can just give someone a fat JAR (here it wins with all scripting languages, so Python, Ruby, PHP, or any other flavour of the month);

- industry-grade garbage collectors;

- publicly-available standard spec (here it wins with all the defined-by-implementation languages such as Python, PHP, Rust, basically most languages, and with languages which are standardized, but their specs aren't public: C, C++, Ruby);

- kind of like the previous point, but anyway: multiple implementations to choose from;

- I've been told it has good performance. I've never seen a real-world Java application which felt fast, but I've heard people put it at the pedestal and the Debian programming languages benchmarks game seems to corroborate that story;

Besides, the question wasn't about which technologies we like, but which we believe are entrenched so much, they aren't going to go away for a very long time. I don't see Java going away for another 100 years, no matter how much I would or wouldn't like to work with it.

- Fantastic battle tested ecosystem of libraries. - Stable cross platform (kills Python, Node here). - Lingua franca.

Now I personally don't like Java - it feels crusty vs C# - but the libraries are amazing.

You can also use something nice like Kotlin and you have all of the platform benefits with non of the crusty language issues.

I started using java 16 after a long hiatus from java 7 (instead doing rust and clojure) - I'm pretty happy with some of the new language features - lambdas, records, type inference, streams
IMO the Java stdlib also strikes just the right balance between control and abstraction. You can write thread-safe, performant code that makes reasonable tradeoffs between data structures without worrying too much about the details about memory layout and allocation. Said code also is easy to debug even without a debugger because there's almost never undefined behavior caused by use-after-free type bugs and error messages are clear. And the tooling - just IDEs alone, never mind debuggers - is mature and effective.

After using Python, Go, PHP, and C++ it's easy to see why Java is the go-to language for server development.

It wouldn't need as much research into efficient GCs if it was possible to write efficient programs in it. e.g. everything has a lock word, there's no value types or fixed length arrays, you have to allocate boxed integers.
people complain about java's verbosity, but I see that as a feature in places where there's a revolving door of consultants working on things. Everything is so explicit it is easy to see what some code does.
I think anyone who has worked a lot with MSSQL would disagree with Postgres being "so much better". It's only really in the last few years that postgres has pulled ahead, MSSQL was lot more feature rich and performant for a decade.
MSSQL ? As in Microsoft SQL Server? I have heard this argument a lot and all the comparisons I have seen are specific benchmarks on specialized hardware. My own personal experience wasn’t anything like the benchmarks
MSSQL still has a few features that set it apart from Postgres. Off the top of my head are Filestream (basically storing files in the database while still having them accessible as files on the filesystem) and temporal tables without the need for extensions.

Personally if I were choosing the tech stack for my company I'd still go for Postgres though

By "few years" he has to mean 10 to 15 years. ;)
Kubernetes when it comes to clustering.
i had to roll back to 9.6 on windows because \COPY is fundamentally broken for large cvs
What's the issue? Just on Windows? Mac OS X with 13.2 has no issue for me with the 1.1gigabyte 20million record csv just imported last week, or some bigger ones I did a few months back.
Same here, 500MB, 10 million row csv file with no issues on Postgres 11.8.
What exactly makes Postgres better than MySql? There seem to be certain design decisions like WAL or process per connection that cause problems at scale

https://eng.uber.com/postgres-to-mysql-migration/

That article really isn't a good critique of Postgres.
Anyone whose ever had to upgrade postgres ever knows postgres can't fail fast enough. They must fix their upgrade paths and it's endless means to completely fuck you if they want to be taken seriously.
? What's wrong with pg_upgrade?