|
|
|
|
|
by einpoklum
376 days ago
|
|
I had a somewhat similar experience when writing a "remove duplicates" extension for Thunderbird: https://addons.thunderbird.net/en-US/thunderbird/addon/remov... I used a hash to begin with, using a simplistic digest function to the message headers I was comparing, getting me a 4-byte hash key. That worked, but was kind of slow. Finally, the idea came to me to not apply _any_ digesting, and use the combined concatenated headers of the messages as hash keys. About 2k bytes per hash key! The result: About 20x perf improvement if memory serves. How is that possible? The reason is that the code all runs in a Javascript machine; and applying the digest was not a built-in function, it was looping over the headers and doing the arithmetic. Thousands upon thousands of JS abstract machine steps. The use of the large hash key may be inefficient, but - it's just one JS object / dictionary operation, and one of the most heavily-optimized in any implementation. |
|