| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by jonesnc 972 days ago
	Why does the randomization have to happen in the database query? Assuming there aren't any large gaps in the distribution of IDs, and if you know the max_id, couldn't you pick a random number between min_id and max_id and get the record whose ID matches that random number?

1 comments

crazygringo 972 days ago

Precisely because of the gaps. Tons of Wikipedia article ID's aren't valid for random selection because they've been deleted, because they're a disambiguation page, because they're a redirect, or they're a talk page or user page or whatever else.

My comment covered your suggestion already -- that's why I wrote "you encounter the same problem where the id gaps in deleted articles make certain articles more likely to be chosen".

link

chongli 971 days ago

Can't you just query the ID column and grab the entire list of valid IDs, put that into an array, store it, and pick random IDs from that?

link

crazygringo 971 days ago

That requires setting up an entirely different service and somehow keeping it in perfect sync with the database, and along with all of the memory it requires.

And you've still got to decide how you're going to pick random ID's from an array of tens of millions of elements that are constantly having elements deleted from the middle. Once you've figured out how to do that efficiently, you might as well skip all the trouble and just use that algorithm on the database itself.

link

tester756 971 days ago

>And you've still got to decide how you're going to pick random ID's from an array of tens of millions of elements that are constantly having elements deleted from the middle. Once you've

How so? you have array of only valid IDs [1,2,3]

link

chongli 971 days ago

Oh I thought this was a one-time thing, like a research paper. Doing it continuously in real time is much harder.

link