Hacker News new | ask | show | jobs
by dspillett 3334 days ago
> it's nonsensical to have a foreign key that isn't indexed. You always want an index on foreign keys

Incorrect, though cases where you don't want the index are rarer than those where you do especially when thinking about simple examples.

> there is never a scenario where you don't want them.

If the parent entities never have their primary key values changed and are never deleted, then the database engine itself will never use the index to enforce the key. If you never need to join from the parent entities to the child entities then your queries are unlikely to make use of it either.

The extra index takes space (maybe a fair amount of space if the key is wide and/or you have a [bad] design with a wide clustering key), potentially space in your in-memory page pool, and processing time & IO during inserts, updates and deletes. If you are unlikely to need the index then why take that hit?

> But while primary keys are indexed by default, foreign keys are not.

Primary keys need to be indexed to avoid a full table scan on every insert (or update of a key value) to that table or any table that refers to the key via a foreign key constraint. Foreign keys will only cause a scan with no index present if a key value is changed (or a row deleted) in the parent table. As some entities shouldn't be deleted and primary key values should be immutable, it follows that this sort of situation can happen.

It would be possible to make the index optional by other means then requiring you to declare it if you want it, but SQL's modelling language tends towards declaring what you want not what you don't want so it fits better with the syntax to have you add the index if you want it rather than deleting it if you don't or having something in the syntax like "ADD CONSTRAINT fk_key_name ON (<field(s)>) REFERENCES <target>(<field(s)>) WITHOUT INDEX".

> And those sorts of things give the anti-RDBMS crowd enough of a foot-hold to argue for continued ignorance.

This isn't purely a relational problem. noSQL data stores have indexes too, and not having an index on a referencing key that might be needed to check on referential integrity (looking for orphan records) can be a problem there too. We shouldn't change the behaviour of relational stores because some people who use noSQL don't understand good data modelling. Many people using noSQL do understand good data modelling of course, but some use noSQL because they don't want to try understand SQL rather than because it is the wrong tool for the job at hand, and those people probably don't understand noSQL either but get away with not doing so in the short term).

1 comments

> If the parent entities never have their primary key values changed and are never deleted, then the database engine itself will never use the index to enforce the key.

Enforcing referential integrity is not the only thing we can do with foreign keys. Once you have an FK, you're going to want to query against it. To do that without data races requires a join. The join can be optimized better if the FK is indexed.

Come up with an example of an FK you never want to join on and then maybe we talk about not wanting an FK indexed. I've never seen anyone make a convincing argument that this is even as much as a 1% case. It should be so incredibly, almost inconceivably rare to not want an FK indexed that yes, I'm saying it should go against the traditional grain and just be the default. Traditions are not sacred.

> Once you have an FK, you're going to want to query against it.

Not necessarily. You are definitely (at a minimum, implicitly for referential integrity) going to want to query from the table using the FK to the table it references, but you may or may not want to query by the FK column.

> Come up with an example of an FK you never want to join on and then maybe we talk about not wanting an FK indexed.

Joining on an FK doesn't require the FK to be indexed, it requires the target to be indexed. You need an index on the FK of the FK is used in a simple equality or inequality filter criteria other than a join to it's target, or if it's used in a join where the other criteria are filtering it's target table rather than the table with the FK. But if you filter the table with the FK and join to it's target, which is a fairly common case, you don't need an FK index.

> Once you have an FK, you're going to want to query against it.

Not necessarily, and even where you do there are many cases where an index just on the foreign key is not the best choice because the queries are filtering on other properties too so a wider index is worth defining instead.

> Come up with an example of an FK you never want to join on

Recording tables with many fact dimensions, where you need to enforce a limited range of values in each dimension column but only ever use the data afterwards for aggregating after filtering/grouping by other properties. The storage needed for the unnecessary indexes structures could balloon significantly here. This is most commonly seen in warehousing situations, but the pattern is far from unheard of in OLTP workloads.

> it's nonsensical to have a foreign key that isn't indexed

> and "there is never a scenario where you don't want them

It may be uncommon to prefer not to have an index on a foreign key, but absolute statements like those are simply wrong because there are cases where you don't want (or at least done need) them and there are cases where you want something other than what would be generated automatically.

Useful rules-of-thumb perhaps but only if presented as such ("in general X" or "X is true except when it isn't due to Y or Z") rather than absolutes.