Hacker News new | ask | show | jobs
by shiandow 170 days ago
Huh I didn't know the backwards version was more common, it seems odd.

You could also call the last version the online version, as it will ensure the partial list is random at any point in time (and can be used for inputs with indeterminate length, or to extend a random list with new elements, sample k elements etc.)

Not too sure if the enumerate is necessary. I usually dislike using it just to have an index to play around with. A similar way of doing the same thing is:

    for x in source:
        a.append(x)
        i = random.randint(0, len(a))
        a[i], a[-1] = a[-1], a[i]

Which makes the intention a bit clearer. You could even avoid the swap entirely but you would need to handle the case where i is at the end of the list separately.
2 comments

> sample k elements

Not quite sure what you have in mind here, but you need reservoir sampling for this in order to make the selection uniformly random (which I assume is what's desired)

You can just use this algorithm but ignore everything after the first k elements. The algorithm still works if you don't store anything beyond the first k elements but just pretend they are there.
enumerate() is just an awkward way to get len(a). In theory, you could somehow be in an environment where you have dynamically resizing arrays (vectors) that don't track their length internally. But in this case it's probably because OP doesn't have a firm grasp what's happening (which is why they wrote the blog post).