| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dclara 4567 days ago

I have no intent to choose complicated solutions over a simple issue. First, I'm always concerned about the size of the data set. The first sentence I said, if the data set is not big enough, you cannot tell too much difference. Second, I didn't insist on any solutions, so far, I proposed three of them and would like to discuss with you guys on the pros and cons under different conditions. Some people in other posts told me that the first solution using sorting + binary search actually is way faster than hashmap, which deserves more than 2000 points. I tried to verify whether it's true, but nobody answered. Some people helped to provide the Big O comparison on the hash lookup algorithm and linklist, which is also constructive. But did you learn from it that building hashmap cost more than O(1)? Lastly, I proposed the Java solution with string index, did you ever hear about that?

I don't need to spend too much time on finding the solutions, they are on top of my head. Depending on different conditions I'll use different solutions. 4GB is the upper bound of string indexing, if it's being used for web indexing, it's still not enough. If in this case it is used in document indexing for enterprise level with a few MG, it's fine for using any of them. But the difference is the score you get.

I guess eventually you will pick hashmap solution because you have the indexes built in the lookup, which makes more sense over other solutions, but you (or they) don't give the condition out in the first place. How can we discuss based on that? Looks like I pretty much wasted my time on this issue and was taught to learn that making things simple is better. Thank you.