| Performance implication exist, but it is secondary. Primary reason:
distinct on every select shows either lack of knowledge of schema, in particular which columns make rows unique, or unfortunate schema design.
(Apart from niche cases, schema should be somewhat normal. I.e. column parent_name belongs in the table parent, not in the table student) Select a from x where myuniquekey=1; —- guaranted to return 1 or zero rows, if myuiniquekey is actually unique. Select a from x join y on x.parent_id = y.y_id —- guaranteed to return same amount of rows as exist in y, never more, never duplicates y rows. (N-to-1 relation) If distinct is used in any of above, then question “why?” naturally arises. In more severe case, leads to bugs: Select distinct student.student_name, parent.parent_name from student join parent on student.parent_id = parent.parent_id —- silently discards rows, where by accident student/parent name combo matches several times. Technically sql allows comparing unrelated columns (colour=last_name), but for vast majority of cases, when joining, one of the side should be joined using it’s unique key, and other side should be using it’s foreign key, which ensures that duplicates don’t appear randomly, and thus distinct is not needed. |
Not if distinct is the default.
> Select distinct student.student_name, parent.parent_name from student join parent on student.parent_id = parent.parent_id —- silently discards rows, where by accident student/parent name combo matches several times.
Either with or without distinct can be a bug depending on what you are doing it for.
There are actually 4 variations on what you might want, and you can get all of them with distinct: