Hacker News new | ask | show | jobs
by crazygringo 972 days ago
Precisely because of the gaps. Tons of Wikipedia article ID's aren't valid for random selection because they've been deleted, because they're a disambiguation page, because they're a redirect, or they're a talk page or user page or whatever else.

My comment covered your suggestion already -- that's why I wrote "you encounter the same problem where the id gaps in deleted articles make certain articles more likely to be chosen".

1 comments

Can't you just query the ID column and grab the entire list of valid IDs, put that into an array, store it, and pick random IDs from that?
That requires setting up an entirely different service and somehow keeping it in perfect sync with the database, and along with all of the memory it requires.

And you've still got to decide how you're going to pick random ID's from an array of tens of millions of elements that are constantly having elements deleted from the middle. Once you've figured out how to do that efficiently, you might as well skip all the trouble and just use that algorithm on the database itself.

>And you've still got to decide how you're going to pick random ID's from an array of tens of millions of elements that are constantly having elements deleted from the middle. Once you've

How so? you have array of only valid IDs [1,2,3]

Oh I thought this was a one-time thing, like a research paper. Doing it continuously in real time is much harder.