Hacker News new | ask | show | jobs
by wtfleming 4062 days ago
Maybe i'm misunderstanding, but wouldn't this mean that there would be no way to safely remove a relationship? I guess there are a number of cases where this would be ok.

But people tend to make mistakes. When you discover that someone has added a link between Daniel Day-Lewis and Ernest Goes to Jail all you would need to do is delete a row in the movies_people table. But because of potential collisions in the bit arrays you can't safely remove a relationship?

1 comments

You're indeed correct as this is built upon bloom filters and standard bloom filters don't support deletion[1]. You could reconstruct the bloom filter minus the deleted element from each of the individual relations but then you'd need the relation table.

The other substantial issue with the proposed scheme is false positives. In this situation, you have no way to ensure that a given relation is not an accidental match. For that you would again need the relations table.

What's fun about this story is that the author has inadvertently re-discovered the bloom join[2] - he's just missing one of the important steps. Traditionally bloom filters are used for a fast, initial check which is then followed by a full check to remove false positives.

[1]: There are some modifications to bloom filters that support deletion but they're all more complex than bitstrings.

[2]: http://www.databasejournal.com/features/oracle/oracle-bloom-...

Sometimes a few false positives don't matter. Or matter less than latency. Nobody dies if a query returns Daniel Day Lewis as the star of Ernest Goes to Camp and many people might just write it off as human error...even if it isn't.