|
|
|
|
|
by chrstr
1349 days ago
|
|
> Queries using SELECT DISTINCT can now be executed in parallel. This sounds quite interesting, but I would assume it does not always work? I didn't see this mentioned in the linked documentation, does someone know when/how the parallel distinct works? |
|
Generally speaking, to do distinct you need a dictionary to look up previously seen values. To do it in parallel you need to make that dictionary thread safe.
For Java, such a thread safe dictionary is made by segmenting the table and synchronizing on the segments. So you'd hash your values, figure out which segment that targets, lock that segment, and then read/update that segment to contain the new value.
I'd assume that postgres is doing a fairly similar trick, The only additional synchronization would be on a linked list of found values. In that case, you could either lock the list and update as new values come in, you could sort those values after the fact, or you could employ a lock free algorithm to add nodes to the list (see lock free queue implementations).