|
|
|
|
|
by ashvardanian
238 days ago
|
|
Storage, strings, sorting, counting, bioinformatics... I got nerd-sniped! Can't resist a shameless plug here :) Looking at the code, there are a few things I would consider optimizing. I'd start by trying (my) StringZilla for hashing and sorting. HashBrown collections under the hood use aHash, which is an excellent hash function, but on both short and long inputs, on new CPUs, StringZilla seems faster [0]: short long
aHash::hash_one 1.23 GiB/s 8.61 GiB/s
stringzilla::hash 1.84 GiB/s 11.38 GiB/s
A similar story with sorting strings. Inner loops of arbitrary length string comparisons often dominate such workloads. Doing it in a more Radix-style fashion can 4x your performance [1]: short long
std::sort_unstable_by_key ~54.35 M compares/s 57.70 M compares/s
stringzilla::argsort_permutation ~213.73 M compares/s 74.64 M compares/s
Bear in mind that "compares/s" is a made-up metric here; in reality, I'm comparing from the duration.[0] https://github.com/ashvardanian/StringWars?tab=readme-ov-fil... [1] https://github.com/ashvardanian/StringWars?tab=readme-ov-fil... |
|