| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by anonetal 3185 days ago
	Aside from tooling, those systems often perform much better than PostgreSQL for large queries or transactions, as they feature much better optimizations. Even outside of newer optimizations like "columnar" storage, several of those systems do code generation from queries to avoid function calls, branches, etc., which can have huge performance implications. I worked on the internals of PostgreSQL once, and the number of function calls in the innermost loops were very high. PostgreSQL also used to be (is?) single-threaded, which limited performance of a single query on multi-core machines -- I haven't looked into it to see if there has been any fundamental change in the architecture in the last 4-5 years.

3 comments

grzm 3185 days ago

> PostgreSQL also used to be (is?) single-threaded, which limited performance of a single query on multi-core machines

From the submission:

"Improved Query Parallelism - Quickly conquer your analysis"

Query parallelism was introduced in 9.6 and expanded in 10.

link

anonetal 3185 days ago

Yes, I was just reading through that. The server is still single-threaded though -- they are getting the parallelism by starting multiple processes to do independent chunks of work. This makes sense for PostgreSQL, but has some fundamental limitations (e.g., it requires duplicated copies of a hash table to parallelize a hash join).

link

endorphone 3185 days ago

>The server is still single-threaded though -- they are getting the parallelism by starting multiple processes to do independent chunks of work.

So...it isn't single threaded then? I mean that is exactly how the most advanced competitors operate (Oracle, SQL Server) as well -- a given connection stays on one thread, with the advantages that confers, unless the planner decides to parallelize.

link

adzm 3185 days ago

To be technical, MSSQL uses its own bespoke scheduling, and will preempt the thread for io. All io is nonblocking. The physical thread can vary for this reason. PGSQL really does use synchronous io and a single thread though. The former is probably more scalable but the latter has been serving PGSQL fine, too.

link

tychver 3184 days ago

I think bitmap heap scans have had concurrent IO for quite a while now? There's the effective_io_concurrency setting for it.

link

jeffdavis 3185 days ago

No, processes don't create fundamental limitations. They can still share memory, it's just an "opt-in" choice.

Postgres processes share memory for all kinds of things. Hash tables may be duplicated, but not due to any fundamental limitations.

link

halayli 3185 days ago

PostgreSQL uses shared memory, it doesn't copy the hash table.

link

anarazel 3185 days ago

In the specific case of hashjoins, it does build them independently right now. There's a patch to rectify that though, by putting the hashtable also into shared memory. The coordination necessary to make multi phase batch joins and other such funny bits work, unfortunately made it infeasible to get into 10.

link

halayli 3185 days ago

I stand corrected, it definitely reconstructs the hash table in each process.

link

anarazel 3185 days ago

FWIW, here's the patchset to fix that: https://commitfest.postgresql.org/15/871/

link

Tostino 3185 days ago

Yes there has been. This release expanded it significantly.

link

brightball 3185 days ago

The biggest issue with SQL Server is that it is myopic. The tooling and everything around it is geared toward only SQL Server. The database itself is also geared around only SQL Server...making it a huge pain to get your data out to use it with something else like Elastic Search. It's geared towards being comfortable enough to lock you in and hold your data hostage.

link