Hacker News new | ask | show | jobs
by anxrn 5532 days ago
- Which programming language 'complicates' storing list objects in a hash map? Downright trivial in all languages I can think in (Java, C++, Perl, Ruby, Python)

- What do you mean by uniqueness of elements? There's no mention of sorting by a secondary criterion, if that's what you mean. Which 'standard' sorting algorithm removes duplicates?

- What do you mean by 'unique key maps to a different bucket'? You'd only have to worry about this if you're designing a hashing algorithm, clearly that's not the problem here. You're hashing on age (hence guaranteeing partition on age), and simply adding objects to a list in the hash. Are you implying we'll have to invent a new hash map to solve this problem?

This is literally less than 10 LoC in any high-level language. That 'complexity' surely seems worth O(n) running time?

1 comments

My understanding of bnoordhuis' proposed was that the elements of the hash table would be students, rather than lists of students.
The language was C and my solution (adapted to the article) was something like this:

    // TODO enforce! see also: soylent green
    #define MAX_AGE 100

    struct student {
        const char *name;
        int age;
    };

    struct students {
        long n_students;
        struct student *students[10e6];
    };

    struct students all_students[MAX_AGE];

    void insert(struct student *s) {
        struct students *ss = &all_students[s->age];
        ss->students[ss->n_students++] = s;
    }
So O(1) insertion and O(n) post-processing to stitch the arrays together again. Seems like pretty acceptable performance to me but the Google guy thought otherwise, the answer he was looking for was (probably) a counting sort.

I tried to argue that my solution was a fair bit faster and simpler to boot. That's probably why I didn't get a follow-up interview - no-one likes a smart arse. :-)

Thanks, I'm sorry I misunderstood your brief description.

There's no hash table here, this is just bucket sort. I don't think anyone should care about the difference between counting sort and bucket sort, though. The only thing I would mind about this is that it allocates 100x the necessary memory, but I'm sure you wouldn't actually do that when it matters.