| HN Mirror

Doing it this way will use the optimal amount of memory (in the worst case) (assuming an O(n+m) answer). There are exactly 26 bits of state that need to be tracked (where "bit" is used in the information theoretic sense of the word). If you choose to store this state as 26 actual bits, then you win.

Multiplying together the first 26 primes requires that you can store a number up to 232862364358497360900063316880507363070. log2 of that number is about 127.5. So you need 128-bits of storage.

Essentially, the correct solution is to sort both lists, and then do a single pass through checking for unique items in one list. Once you have this solution, you can do some micro-optimizations: - note that the set of symbols is limited, so you can use a counting sort (or other non comparison based "sort") - note that the sorting function doesn't need to be an actual sorting algorithm, and may discard duplicates.

If you apply these two optimizations, you should end up with the bit-field approach (or maybe a slightly more memory hungry one that doesn't store stuff in individual bits... but algorithmically it doesn't bring anything new to the table).

For some reason, a minority of people are then thinking that a further optimization is to use a more inefficient data structure in the sorting algorithm for storing whether or not a letter has been seen. This has the effect of a more expensive read operation, write operation, more memory usage and provides no other benefits.