Hacker News new | ask | show | jobs
by EddieJLSH 1054 days ago
Realistically which production DB tables don't have a unique id? Genuine question, never used one in my life.
6 comments

Log analytics or warehouse tables often have no simple useful key for this sort of comparison.

Also in a more general case you might be comparing tables that may contain the same data but have been constructed from different sources. Or perhaps a distributed dataset became disconnected and may have seen updates in both partitions, and you have brought them together to compare to try decide which to keep or if it is worth trying to merge. In those and other circumstances there may be a key but if it is a surrogate key it will be meaningless for comparing data from two sets of updates, so you would have to disregard it and compare on the other data (which might not include useful candidate keys).

Also, database tables where unique key constraints aren't enforced. Programming and operational mistakes happen. :-)

https://stackoverflow.com/questions/62735776/what-is-the-poi...

It happens. I’m currently working on a project where the CRM tool I need to access for data, actually does not have a unique id in its db. I have no idea if I will be able to successfully complete the project yet.
Is there any chance that the rows actually do have a unique id, but it's not being displayed without some magic incantation?

Asking because I've seen that before in some software, where it tries to "keep things simple" by default. But that behaviour can be toggled off so it shows the full schema (and data) for those with the need. :)

Sadly, no.

The manufacturer is just really incompetent.

I was told their reason when asked was „it was easier (for us)“.

> it was easier (for us)

That's not all that unusual when something gets implemented, as people tend to take the easy approach for things that meet the desired goal.

It just sounds like the spec they were writing to wasn't very clear or it was just a checkbox list of features provided to them by marketing. So "lets get this list done then ship it". ;)

The question is whether it was actually easier.

Even a couple minutes of extra debugging takes longer than learning how to add a synthetic primary.

For example tables that store huge amount of logs or sensor data where IDs are not very useful and just increase space usage and decrease performance.
PostTags in the published Stack Overflow schema - https://data.stackexchange.com/stackoverflow/query/edit/1772...

It happens a lot when people are implementing something quick and often happens in linking tables.

Don’t things like BigQuery always allow duplicates?
Good point, have not used it before but looks like you have to add a unique ID if you want one