Hacker News new | ask | show | jobs
by cameronh90 268 days ago
For a while I was maintaining software that supported both MSSQL and PGSQL, and I found that, when comparing like-for-like without DB-specific tuning, MSSQL produced better query plans on average. On a database without write contention, I'd often see 30% better performance on MSSQL.

However, it was also much more likely to hit an AWFUL pathological case which completely wrecked the performance as you describe. Combined with pessimistic locking, we ended up with far more overnight support calls from the MSSQL backend than from the PGSQL backend. Usually because it suddenly decided to switch query plan at 1AM.

I wonder if there's a trade-off where an optimizer produces better average query plans but worse outliers.

3 comments

> I wonder if there's a trade-off where an optimizer produces better average query plans but worse outliers.

There is. A (now) classic case is running a nested loop with a sequential scan on the inner side. Works great if the outer side produces few rows as expected, a disaster if there's a misestimation and you have a thousand rows there.

The MSSQL and Postgres planners are different enough that you can't really say it's about one thing, though (MSSQL is one of the few implementations of the Cascades framework in a sort-of hybrid AFAIK, Postgres is straight-up System R).

I think the big difference is that PostgreSQL doesn't cache query plans at all by default. Only if you use prepared statements. My understanding is that MSSQL does heavily cache them.

It sounds plausible to me that caching would often lead to significant performance improvements overall, but trigger bad plans much more often since the plans are not re-evaluated based on the statistics of each single query. So in Postgres you'd get individual queries with pathological performance when the statistics are off, in MSSQL all executions of that query have bad performance until the plan is re-evaluated again.

We're experiencing the same with MSSQL, and for our most important queries have started adding a constant-valued dummy column to the SELECT section which value changes every few minutes. Essentially an integer equal to UNIX time divided by 600 or similar.

That way a cached bad plan can't cause issues for more than a few minutes, which is acceptable for our use-case.

It's a sledgehammer but it was easy to add and it works.

In Oracle many years ago, we did the same thing by prepending a SQL comment to the query string. You’d think that the plan cacher normalizes queries first, but I guess not.

That might work in your case as well, without requiring modifications in logic to support the dummy field?