Hacker News new | ask | show | jobs
by naasking 927 days ago
> In light of this, it should be obvious why duplicates make no sense in a relation. Saying something twice doesn't make it more true!

Saying something twice doesn't make it more true, but that could also entail that duplicates should be benign. After all, "true AND true" is still true. So I don't think the conclusion that duplicates make no sense follows from this fact, it rather entails that duplicates should have no effect (idempotency).

In fact, a higher performance implementation is possible if we permit duplicates at some levels of the system because we don't need to check for duplicates. If duplicates have to be removed, that can arguably be done at the final materialization stage, or some other stage where it makes the most sense and overall work is minimized.

1 comments

Yeah, I worked on https://tablam.org and https://spacetimedb.com.

It becomes pretty clear that `order` is a significant property to make useful (and performant!) programs. "Duplicates" is also required to make usefull programs.

One nonobvious reason for this: You wanna report that a `customer` has a duplicated key `1`. If you CAN'T model `[(customer.id = 1), (customer.id = 1)]` then you can't report errors! And `erroneous` data is VITAL to make useful programs because then the only possibility is "perfect" data, and that is not possible!

Another reason is that we want to `count` duplicates, to see `duplicates`, and other NON-obvious at first: "What is a duplicate?". Get fun with floats, Unicode, combining case and non-case sensitive input... and is obvious that for useful programs IS REQUIRED to support bags in an extended version of the relational model.

And yet...

IS very important to remember about `set semantics` and try to adhere to it when makes sense. Your query planner will like it. You "valid" constraints like it. And `unique index` like it. And so on...