|
|
|
|
|
by cogman10
1346 days ago
|
|
Couldn't tell you the when, but I can tell you the how is likely how you'd expect. Generally speaking, to do distinct you need a dictionary to look up previously seen values. To do it in parallel you need to make that dictionary thread safe. For Java, such a thread safe dictionary is made by segmenting the table and synchronizing on the segments. So you'd hash your values, figure out which segment that targets, lock that segment, and then read/update that segment to contain the new value. I'd assume that postgres is doing a fairly similar trick, The only additional synchronization would be on a linked list of found values. In that case, you could either lock the list and update as new values come in, you could sort those values after the fact, or you could employ a lock free algorithm to add nodes to the list (see lock free queue implementations). |
|