Hacker News new | ask | show | jobs
by gvinciguerra 1965 days ago
You are right, variable-length strings are difficult. You could try to pack as many characters as possible in a computer word (or in a big int data type), say P characters, and then use the PGM-index to find the strings that share a prefix of P chars with the given query string. I discussed this solution in a GitHub issue (https://github.com/gvinciguerra/PGM-index/issues/8#issuecomm...). It may work in practice, but it's far from being adequate if compared to trie data structures (and their recent advancements).
1 comments

Hi Giorgio, thanks for sharing!

You mentioned that tries have had recent advancements, could you please point me to a paper about these advancements so I can learn more? I did a quick Google search but wasn't able to find anything that seemed relevant.

Hi @Gh0stRAT, you are very welcome! For prefix search on strings, I recommend the classic String B-tree paper (https://dl.acm.org/doi/10.1145/301970.301973). Among recent results, there's the c-trie++ paper (https://arxiv.org/pdf/1904.07467.pdf) and the papers mentioned in their Related Work section.