Hacker News new | ask | show | jobs
by anonymoushn 5532 days ago
First, 'hash table' isn't a sorting algorithm. If you apply 'hash table' to an array the result is not a sorted array.

Second, assuming that numberOfYearsOld takes on at most 100 unique values (which I think is reasonable), the best you can hope for is for every hash table insertion and lookup to require you to traverse an average of 5000 students (because on average you will have to look through half of a list of 10,000 students). If you used this as an intermediate step in a real sorting algorithm, your sort would take O(n^2) time.

1 comments

Using a hash table as auxiliary storage is a perfectly reasonable 'sorting algorithm'. He's not asking you to name one of the common sorting algorithms. He's asking you to solve a sorting problem, with a loosening of the usual limited memory constraint.

Once you've scanned the list, why do you have to traverse the 100 lists at all (unless you want to further sort them on 'name')? You just need to merge all the lists (In the mom case, simple concatenate all the stacks). This is a constant time operation.

The 'trick' here is to think about how would you implement a sort if you had no memory constraints. As the blogger rightly says, engineers will usually pick one of the standard ones without thinking (I would surely do the same).

I guess a candidate could say "I would solve this problem by making a hash table keyed on the number of years, except that unlike the hash table that ships with my language, this hash table will expose its constituent linked lists to me directly, guarantee that each unique key maps to a different bucket, and not enforce uniqueness of elements (so as to avoid performing a linear scan of the element's bucket upon insertion)." I probably wouldn't reject someone who said that, but I would wonder what prompted such a complex answer.
- Which programming language 'complicates' storing list objects in a hash map? Downright trivial in all languages I can think in (Java, C++, Perl, Ruby, Python)

- What do you mean by uniqueness of elements? There's no mention of sorting by a secondary criterion, if that's what you mean. Which 'standard' sorting algorithm removes duplicates?

- What do you mean by 'unique key maps to a different bucket'? You'd only have to worry about this if you're designing a hashing algorithm, clearly that's not the problem here. You're hashing on age (hence guaranteeing partition on age), and simply adding objects to a list in the hash. Are you implying we'll have to invent a new hash map to solve this problem?

This is literally less than 10 LoC in any high-level language. That 'complexity' surely seems worth O(n) running time?

My understanding of bnoordhuis' proposed was that the elements of the hash table would be students, rather than lists of students.
The language was C and my solution (adapted to the article) was something like this:

    // TODO enforce! see also: soylent green
    #define MAX_AGE 100

    struct student {
        const char *name;
        int age;
    };

    struct students {
        long n_students;
        struct student *students[10e6];
    };

    struct students all_students[MAX_AGE];

    void insert(struct student *s) {
        struct students *ss = &all_students[s->age];
        ss->students[ss->n_students++] = s;
    }
So O(1) insertion and O(n) post-processing to stitch the arrays together again. Seems like pretty acceptable performance to me but the Google guy thought otherwise, the answer he was looking for was (probably) a counting sort.

I tried to argue that my solution was a fair bit faster and simpler to boot. That's probably why I didn't get a follow-up interview - no-one likes a smart arse. :-)

Thanks, I'm sorry I misunderstood your brief description.

There's no hash table here, this is just bucket sort. I don't think anyone should care about the difference between counting sort and bucket sort, though. The only thing I would mind about this is that it allocates 100x the necessary memory, but I'm sure you wouldn't actually do that when it matters.