|
|
|
|
|
by usman-m
3970 days ago
|
|
Oops, it was meant to be: SELECT <user_id> FROM (SELECT DISTINCT user_id FROM user_actions);
You're absolutely right that both those queries will give the same result. I guess I was trying to motivate the basic problem of finding whether some user exists in a set of users, and `SELECT DISTINCT` is the SQL way of representing a set.Fixed the post, thanks! |
|
It doesn't help that using unnecessary DISTINCTs is subqueries is a common performance problem in novice SQL. Why people do that I don't really understand, but they do.
That's the thing about probabilistic data structures - I've never seen a real-world performance problem in SQL where they would have been helpful. I really would like to have an "aha" moment where somebody shows me one.
Probabilistic data structures do seem like a natural match for streaming databases, but that's different.