| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by anaphor 4660 days ago
	Or you decide that your problem is small enough that the poor memory complexity doesn't matter much.

1 comments

supo 4660 days ago

Sure, but it can blow up faster than you think. Say you have 100k URL strings, up to 128 chars each, with 64 allowed characters in alphabet.

- storing this at byte per char = 12.8MB

- storing this in a basic trie = 12.8 * 64 = 820MB

- storing this in the TST = 12.8 * 3 = 38MB

link

anaphor 4660 days ago

Sure, I was thinking on the order of <100 strings around 5-10 characters each at most. I agree that the TST is much better suited for large sets of strings. Also I think more developers should actually do the calculations even if they only have an estimate of how large the inputs will be.

link

supo 4660 days ago

Ah, in that case it goes straight to set<string> for me :-) I think that in practice you want to do that but then drop in a few asserts to check whether things are not getting out of control once you forgot about this decision.

link

jdmichal 4660 days ago

Is this taking into account any of the memory savings that would be seen from common paths using the same memory? If a lot of entries have similar prefixes, this can lead to a lot of deduplication.

link

supo 4660 days ago

Depends on the randomness of the paths. With URLs coming from a limited set of websites - yes, there would be some savings.

With a set of random strings, for alphabet with 64 chars, there are 16M different 4 character prefixes, so the savings from overlapping prefixes for 128char long strings is likely less than 3%.

link