Hacker News new | ask | show | jobs
by Oxryly 5687 days ago
Even faster is to fill a 32-bit mask with "seen" characters for each string. AND them together and compare result to the mask from the second string.
2 comments

Maybe. You need 256 bits for all the possible chars, and this solution is perhaps a little less intuitive. Also, the shift + and + cmpz is not necessarily going to be significantly faster than an address lookup + cmpz because of the latency of getting the array from memory or cache. Though it will use less cache space compared to the size of modern caches that's not enough of a win to matter much I would guess.
I was assuming a 26 letter alphabet. At that point the shifting, anding and comparing would take place in registers.
True, although if there are lower case letters too it could become a problem. At any rate, I'm not sure that the register aspect would improve speed significantly because of the latency of getting the string from main memory. You're still going to get pipeline stalls as you reach portions of the string that are not in the cache.
Yeah, the registers will only be faster than manipulating some sort of bool array. Reading the original string (with associated cache stalls) will certainly be necessary.
Doing it this way will use the optimal amount of memory (in the worst case) (assuming an O(n+m) answer). There are exactly 26 bits of state that need to be tracked (where "bit" is used in the information theoretic sense of the word). If you choose to store this state as 26 actual bits, then you win.

Multiplying together the first 26 primes requires that you can store a number up to 232862364358497360900063316880507363070. log2 of that number is about 127.5. So you need 128-bits of storage.

Essentially, the correct solution is to sort both lists, and then do a single pass through checking for unique items in one list. Once you have this solution, you can do some micro-optimizations: - note that the set of symbols is limited, so you can use a counting sort (or other non comparison based "sort") - note that the sorting function doesn't need to be an actual sorting algorithm, and may discard duplicates.

If you apply these two optimizations, you should end up with the bit-field approach (or maybe a slightly more memory hungry one that doesn't store stuff in individual bits... but algorithmically it doesn't bring anything new to the table).

For some reason, a minority of people are then thinking that a further optimization is to use a more inefficient data structure in the sorting algorithm for storing whether or not a letter has been seen. This has the effect of a more expensive read operation, write operation, more memory usage and provides no other benefits.

If you're marking used letters in a mask, why sort first?
It would depend on the hardware used.

Standard counter-intuitive example (from the game development domain) is how to check if an array of integers has a zero in it. The answer, when all things are considered, is to scan through an entire array making a simple arithmetic for each item and one comparison at the end rather than compare each element.

The same may happen with your example. Filling up 32-bit mask might be more expensive than to flip an element in a boolean array. That's not even considering that the array option allows for early termination of the second loop.

Early termination is the biggest advantage of the array version. The 32-bit mask is probably the fastest possibility since it will reside in a register.