|
|
|
|
|
by throwaway2037
375 days ago
|
|
> where linear search beats constant time maps
Can you give an example? You said lots of good things in your post, but I struggling to believe this one. Also, it would help to see some wall clock times or real world impact. |
|
Some example workloads include:
1. Tokenization (checking if a word is a keyword)
2. Interpretation (mapping an instruction name to its action)
3. ML (encoding low-cardinality string features in something like catboost)
4. JSON parsers (usually key count is low, so parse into a linear-scan hashmap rather than a general-purpose hashmap)
Details vary in the exact workload, the hardware you're using, what other instructions you're mixing in, etc. It's a well-known phenomenon though, and when you're doing a microoptimization pass it's definitely something to consider. 2x speedups are common. 10x or more happen from time to time.
It's similar to (but not _quite_ the same as) the reason real-world binary search uses linear scans for small element counts.
When you go to really optimize the system, you'll also find that the linear scan solution is often more amenable to performance improvements from batching.
As to how much it matters for your composite program? Even at a microoptimization level I think it's much more important to pay attention to memory access patterns. When we wrote our protobuf parser that's all we really had to care about to improve performance (33% less execution time for the entire workload, proto parsing being much better than that). You're much more likely to be able to write sane code that way (contrasted with focusing on instructions and whatnot first), and it's easier to layer CPU improvements on top of a good memory access pattern than to go the other way around.