Hacker News new | ask | show | jobs
by methodover 2471 days ago
> But having a deep understanding of data structures (not just arrays and maps) and algorithms really gives you a mastery of your craft, especially around performance and scalability.

Can you give an example?

2 comments

Ok.. Find all the words that match a search string prefix.

I.e. you have a search field and want to show the possible matches as you type. With each new char you’re calling this search and getting back a list of words. The dictionary is in memory in whatever data structure you choose. You only have to return matches if the search is 3 or more chars long.

Now, how does your implementation scale when the dictionary has a million words?

Prefix tree is the right answer here right? From an interview standpoint.

But my question is: why would you want to roll your own here? Wouldn’t the appropriate move be to look for solutions for your specific use case that’ve already been made? Ie if you’re talking web, I see no reason why a standard database index would work. MySQL’s index, a b-tree, is perfectly serviceable for the question, no? I’m having trouble imaging a situation where an out of the box solution wouldn’t work...

I don’t think I have a deep understanding of efficient data structures. I couldn’t re-implement a red black tree for example without looking it up. I wonder: is it enough to have a cursory understanding of things like runtime complexity, space complexity; understand the different basic data structures and how they have trade offs that can help or hinder different operations, and how that connects to real world work. Especially how different actual solutions use different data structures internally. What I don’t understand is the value from going from cursory understanding of data structures to deep understanding.

FWIW, even highly accomplished software developers / hackers like Jonathan Blow[1] have said that in almost every case, you should just use an array as your data structure (or some variation of an array), unless you actually have a good reason not to (that will get you a lot of value)

Of course that doesn't mean you shouldn't know how advanced data structures work and be able to work with them, it just means you shouldn't reach for say, a fibonacci heap before you just use an array.

[1] https://en.m.wikipedia.org/wiki/Jonathan_Blow

> Of course that doesn't mean you shouldn't know how advanced data structures work and be able to work with them..

We don't disagree. Understanding data structures is about reaching for the right tool, and very often the hammer of an array/map is the right tool. But the problem I gave isn't that exotic. It's a case where "reaching for an array" is a brute force solution that won't scale.

So how do you separate candidates who understand those limitations from those that don't? By asking about those cases. They're not that common, but that doesn't mean they never come up or you won't get a nasty bug if you don't understand these foundations.

I would start by asking the question without mentioning scaling at all, and see how they respond.

If they immediately jump to using a fancy data structure like a suffix array, but they don't ask any questions about how critical performance and scaling are, then it shows they know a lot of stuff about CS, but they may be lacking in more practical experience.

If they tell you they would just use an array or a map (which would be extremely inefficient with large amounts of data), then you ask a follow-up question about scaling, and see how they respond. If they can't answer that question, then they lack practical experience and fundamental knowledge of advanced data structures.

Do you agree with that approach?

I totally agree with asking the basic question and then adding scale and complexity.

I don't know if I agree that using a trie at the outset reflects poorly on a candidate. They might see where you are going with the question, even though "you didn't mention scale." Good candidates are going to think about scale at least a little bit. That's not necessarily "lacking practical experience." It's not like basic tries or linked lists are super complex "fancy" data structures.

I think the key is for both parties to communicate thought process. If you're concerned about overengineering then prompt them to explain why they didn't choose an array.

Outside of special cases you would be doing lookup in some data store so this has fairly minimal chance of mattering in a real world project.
Load it all into Elasticsearch and just do an API call.
Network latency is an issue then. The word list is going to lag behind keystrokes. Also, doesn't work offline.
If you assume your corpus fits in the same instance (since you mention offline), you can run a single node elasticsearch on the same instance as well.
No, you can't run elasticsearch offline on a web client.

Look, we can keep going back and forth and maybe you'll dig up some javascript library that implements a search. And that's fine, I believe there are "maker" roles that largely are about gluing things together and translating requirements into code and shouldn't require screening for CS fundamentals. But at some point it gets a little hacky and suboptimal if you lack a good practical grasp of computer science.

Ha, yeah, I might say something similar, except that I would say, "mostly just arrays and maps".