DISTINCT generally requires the results to be sorted which has O(n^2) worst performance so it can have a big performance hit on a query. It is best to make your database structure such that queries only return distinct data. E.g. by disallowing duplicates
If your sorting algorithm degrades to anywhere near O(n^2) in pathological cases, you're doing something wrong. And even if it's just a kind of timeout/operations-limit to detect pathological cases and just run an in-place mergesort instead. Tail latency/containing pathological data is quite important if there's any interactivity.
Nearly every time, it's a symptom of bad data normalization.
But every time, it interferes badly with any kind of locking (that's DBMS dependent, of course), and imposes a high performance penalty (on every DBMS).
In order to determine the distinct items, the items need to be deduplicated. Generally that's done in only two ways: a hash table that skips items already seen, or a sort followed by a scan that skips over duplicates. The hash table is O(1), but the sort is easier to make parallel without sharing mutable state and has more established algorithms to use when spilling to disk.
Bingo. I used to work with a guy who would see duplicate results and just throw a distinct on his query. I had to keep on him to fix his queries or explain why distinct was correct in this case. My default is that distinct is almost always not the solution.
In your example you must also have another table, 'sample' with all the samples. So yes, you would use an exists or in subquery with the table you suggested.