Hacker News new | ask | show | jobs
by jcoffland 3722 days ago
__inet_lookup_listener could be modified to use a two level hash that only kicked in for buckets which exceeded a certain threshold. The second level hash tables would xor the parts of the destination IP address modulo the hash table size. Special care would have to be taken for listeners on 0.0.0.0. This would solve the problem with negligible cost to the general case but at the expense of added complexity.
1 comments

In fact wouldn't it be simpler to just use an array with all 64k destination ports for the 1st level, and a hash based on destination IP for the 2nd level? This would only need 64k*sizeof(pointer) memory globally for the 1st level, which sounds reasonable for everything except low-memory embedded devices. Or does cache-locality matter here so much that its worth taking the hash-collision penalty on the 1st level?

For the 2nd level you could size the hash table appropriately since you always know the maximum number of IP addresses a host has.

Would be nice if this 2level array/hash was tunable from /proc or /sys.

Increasing the hash table size would be bad for embedded systems and would not reduce the complexity for problems like the one described here. In general, hash table tuning is challenging but you shouldn't need to make the table significantly larger than some fraction of the total number of entries as long, as your hash function is effective. The problem here is that listeners are differentiated by both destination IP and port but the hash table only uses the port and then does a linear lookup by IP. As was demonstrated the hash function is not always effective and the list of listeners in a bucket can get quite long. Presumably the existing scheme was chosen because it is simple and makes it really easy to handle 0.0.0.0 listeners.

Using a binary tree at each bucket could also work well but you would have to rebalance the tree periodically if listeners were inserted in sorted order. A self balancing tree could be used instead but then again this adds complexity.