Hacker News new | ask | show | jobs
by bnoordhuis 5532 days ago
> His grandmother knew radix sort.

Radix sort? I read it as a trick question where the answer is 'hash table with numberOfYearsOld as the key'.

I suggested that as the solution to an almost identical question when interviewing with Google - and it got rejected by the interviewer. Ah, the arbitrariness of it all!

2 comments

First, 'hash table' isn't a sorting algorithm. If you apply 'hash table' to an array the result is not a sorted array.

Second, assuming that numberOfYearsOld takes on at most 100 unique values (which I think is reasonable), the best you can hope for is for every hash table insertion and lookup to require you to traverse an average of 5000 students (because on average you will have to look through half of a list of 10,000 students). If you used this as an intermediate step in a real sorting algorithm, your sort would take O(n^2) time.

Using a hash table as auxiliary storage is a perfectly reasonable 'sorting algorithm'. He's not asking you to name one of the common sorting algorithms. He's asking you to solve a sorting problem, with a loosening of the usual limited memory constraint.

Once you've scanned the list, why do you have to traverse the 100 lists at all (unless you want to further sort them on 'name')? You just need to merge all the lists (In the mom case, simple concatenate all the stacks). This is a constant time operation.

The 'trick' here is to think about how would you implement a sort if you had no memory constraints. As the blogger rightly says, engineers will usually pick one of the standard ones without thinking (I would surely do the same).

I guess a candidate could say "I would solve this problem by making a hash table keyed on the number of years, except that unlike the hash table that ships with my language, this hash table will expose its constituent linked lists to me directly, guarantee that each unique key maps to a different bucket, and not enforce uniqueness of elements (so as to avoid performing a linear scan of the element's bucket upon insertion)." I probably wouldn't reject someone who said that, but I would wonder what prompted such a complex answer.
- Which programming language 'complicates' storing list objects in a hash map? Downright trivial in all languages I can think in (Java, C++, Perl, Ruby, Python)

- What do you mean by uniqueness of elements? There's no mention of sorting by a secondary criterion, if that's what you mean. Which 'standard' sorting algorithm removes duplicates?

- What do you mean by 'unique key maps to a different bucket'? You'd only have to worry about this if you're designing a hashing algorithm, clearly that's not the problem here. You're hashing on age (hence guaranteeing partition on age), and simply adding objects to a list in the hash. Are you implying we'll have to invent a new hash map to solve this problem?

This is literally less than 10 LoC in any high-level language. That 'complexity' surely seems worth O(n) running time?

My understanding of bnoordhuis' proposed was that the elements of the hash table would be students, rather than lists of students.
The language was C and my solution (adapted to the article) was something like this:

    // TODO enforce! see also: soylent green
    #define MAX_AGE 100

    struct student {
        const char *name;
        int age;
    };

    struct students {
        long n_students;
        struct student *students[10e6];
    };

    struct students all_students[MAX_AGE];

    void insert(struct student *s) {
        struct students *ss = &all_students[s->age];
        ss->students[ss->n_students++] = s;
    }
So O(1) insertion and O(n) post-processing to stitch the arrays together again. Seems like pretty acceptable performance to me but the Google guy thought otherwise, the answer he was looking for was (probably) a counting sort.

I tried to argue that my solution was a fair bit faster and simpler to boot. That's probably why I didn't get a follow-up interview - no-one likes a smart arse. :-)

Now, the good answer, and essentially what the grandmother gave, is essentially just old radix sort like punch cards were sorted.

Radix sort? Take a deck of punch cards with a non-negative integer in base 10 right justified in some three columns. Run the cards through the sorter sorting on the least significant digit. The sorter puts each of the cards into the right one of 10 pockets, one for each of 0-9. Or just imagine that it builds 10 'linked lists'. Then move the cards back to the input, 0 pocket first, then the 1 pocket, etc. Then sort on the 10s digit column. Then repeat on the 100s digit column. Done. Just did the work in time proportional to 3*n for n cards.

If have a sorter with 1000 pockets, then can do the sorting in one pass. For the ages, such a sorter would sort all the input in just one pass.

Also cute, radix sort is 'stable' which means if card A starts before card B and if these two cards have the same value on the sort key then at the end card A will still be ahead of card B. Stable sorts are nice because they can permit sorting on several keys separately and, then, have the whole list of records in sort on the 'hierarchical key' made from 'concatenating' the separate keys.

Again, can do it all playing with pointers and linked lists.

It remains, for short keys, radix sort is faster than the (n)ln(n) sorts.

Technically the sort algorithm from the article is called counting sort, not radix sort. While both radix sort and counting sort are O(n) they are fundamentally different algorithms. In practice radix sort can be used for larger domains of values than is feasible with counting sort. Furthermore care must be taken if you want a stable sort since only certain variants of radix sort are stable.
What the grandmother did with ages, say, 0-99, would be radix sort if the sorter had 100 pockets. Then the sort was done in just one pass. I don't recall counting sort, but in this case it sounds like the same thing.

The obvious way to implement radix sort, just think punch cards, is stable with no extra effort.