I don't think a lot of the argument that integer IDs reveal too much.
Yes, they are guessable but your application should not rely solely on the "secrecy" of the ID to authorize access to a record. If you are worried about someone crawling your public API with wget or curl and an incrementing counter you should re-think whether your data are really public or not, or maybe rate-limit anonymous users, etc.
They also reveal something about the total number of records in your database, I guess that could matter in some contexts but it's never really been an issue in practice for me.
I have definitely used the technique of defining non-overlapping sequences in different shards (with Oracle, not Postgres, but they are very similar in this regard). It worked very well and was easy to reason about.
As a developer, the big issue I have with UUIDs is that they are impossible to read or type. You have to copy/paste and it isn't easy to look at two UUIDs and tell if they are different.
I use integers in general unless the specific use case demands something else.
> Yes, they are guessable but your application should not rely solely on the "secrecy" of the ID to authorize access to a record
Any information you give to a potentially malicious actor can help them attack you. If you have a choice between leaking information and not leaking information, I can’t imagine why you would ever intentionally go with the former, unless you didn’t actually have a choice (feasibility, etc.).
As an example, maybe I needed to CSRF someone but needed their ID (say, in a b2b app where IDs are not generally public) - with sequential IDs I have a decent chance of cracking it with a small number of requests, especially if it’s a small userbase. Sure, the CSRF was the main issue, but this is a contrived example to illustrate the point.
Admittedly, IDs are oftentimes public information by necessity - but there’s no need to allow them to leak extra information.
As soon as those IDs are used by any other people or business processes in any way whatsoever, their usability starts to matter and arguably in most cases is simply more relevant than a minor hypothetical advantage to an attacker.
For example, if some customer ID is used by customers in communication e.g. when calling you on the phone over some billing issue, then there would be strong advantages if your IDs aren't unnecessarily long and if they include some explicit redundancy (e.g. a check-digit with Luhn formula) to protect against communication mistakes.
For another example, if your IDs aren't visible to outsiders but are used in your internal business processes then it may be quite valuable to ensure that IDs of different types (e.g. customer ID vs account ID vs transaction ID) are obviously distinguishable in some way so that someone seeing XXXXXXX knows that it likely is a transaction ID and definitely can't be a customer ID; and it's quite valuable to ensure that you can't have accidental collisions where the same number is a valid ID in different key tables so a bug or miscommunication that confuses them would result in data corruption or information disclosure instead of simply failing.
So "ID design" deserves some attention from UX perspective and blocks of random data aren't optimal UX.
So we run surveys among general and specialized audiences (among other things), and these surveys link to custom scripting, images, videos, etc. The URLs have to be freely accessible, but if they are sequential, anyone can simply try to guess what's in other surveys, potentially getting information about their competitors.
This is an example where you don't need a UUID as the key (since you could have another field that stores this "secret" value), but it makes it very convenient if you do use UUID as primary key by default because you get that "secret" value for free (no need to create another column and index). In my projects I use it by default for all models. It comes in handy. Another use case is needing to know the primary key before inserting into the database (at either the front end or the backend, but typically the backend).
True, the record contains both a classical sequential id and a uuid (to maintain backwards compatability), but now everything is linked through the uuid instead of the id. Convenient, indeed. And there's never much data associated with a single uuid, so performance is not an issue.
I had this issue in a case I think is interesting; a customer had a database with incremental IDs of a certain product they sold. On a web platform, the product owner in turn could log in and view a list of their products and their status. The id of the product was part of the URL; /product/851. Of course, the product owners could not get any information on IDs they didn’t own, but the numbers gave away info on how many devices existed before them. And they wanted to hide that information.
Of course, there are many ways to solve that situation, but UUIDs is one.
Just pick a random number at the beginning, and start incrementing IDs from there. Like personal checks starting at 1000 so they're always(ish) 4 digit. Of course, maybe pick another starting number that's less obvious.
There's another benefit to UUID - You can generate them anywhere including application side.
Doing this on application side would have tremendous batching benefits or inserting objects with relationships at the same time (Vs waiting first insert to return an ID to be used in the FK).
I think ideally your primary key is whatever makes sense for your performance/data model, and then if you want to delegate authority with UUIDs you do that via a separate mapping.
By separating that out you can get a lot:
1. You can extend your delegate system by modifying the delegate table, rather than having to muddy your data model with authority information
2. You can TTL the mappings in the delegate table
3. You can have many mappings to the same data, but each one can have its own restrictions.
It's a bit more complex but you end up with a system that really hones in on the benefit that you're bringing up.
We'll have to agree to disagree. Systems like the ones I described are a hell of a lot easier to build on long term while maintaining those invariants.
If your goal is to use the uuid as a delegated capability it's going to be much more complex to use the primary key for your row than to use a separate key.
Not really. You should develop better tooling to visualize debugging information. Today's serious systems (this in my opinion includes e.g. collaborative rich text editors) are just too complicated to just eyeball. Pavel, a colleague of mine is developing a new collaborative rich text editor for OrgPad and here is, how we do some testing currently https://www.youtube.com/watch?v=VeVcNmNFzmc
We use UUIDs for basically everything. It is simple, we have good tools for working with UUIDs. When we present any IDs to users, those are URL-safe BASE64 encoded UUIDs. In some places, we have "short" links that are about half the length - they cannot really be guessed but are much shorter to type/ Copy&Paste/ visually compare for our customers, who are not always super computer literate. :-)
Disagree. /user/edit/5 tells me easily what record it is about on the database without having to copy paste an UUID.
Dev experience truly is a case of death by thousand cuts. I avoid every little cut I can like the plague so energy goes into making cool stuff.
> Should develop better tooling to visualize debugging information.
Thing is, why would I spend time overengineering tooling I don't need if I can get away with incrementing ids? When optimizing for value, I'd rather spend time solving business problems.
Your editor is very cool btw. And it's clearly a case where incrementing ids are not optimal.
Thank you for the reply. You are totally right about the dev experience and the death by thousand cuts.
I really was talking about serious/ complicated systems, where you want some consistency in the components it is made of even if a particular case could just use an incremental id. I my view, you don't spend the mental energy on switching between models. I have just adjusted/ rewritten a part of the system, where we for historical reasons used logins instead of UUIDs. By using logins in this particular case, we had subtle bugs that wouldn't occur with UUIDs. It would never have happened, had we used UUIDs everywhere from the start. Those already have standard validation functions and you don't have to think about stuff that really isn't your business problem. We are also talking engineering days thrown out of the window just for the change from logins to UUIDs later on. I don't think you waste so much time copy&pasting stuff.
I guess, we arrive at the same conclusion but have our differences about the ways leading to it. We both want to focus on business problems as much as possible.
You're not wrong, but (as I suspect is the case with a lot of us) the vast majority of my work is CRUD and I don't reach for heavyweight debugging tools unless printf() or the equivalent fails me (which is rare). Integer IDs work great in this situation.
Yes, they are guessable but your application should not rely solely on the "secrecy" of the ID to authorize access to a record. If you are worried about someone crawling your public API with wget or curl and an incrementing counter you should re-think whether your data are really public or not, or maybe rate-limit anonymous users, etc.
They also reveal something about the total number of records in your database, I guess that could matter in some contexts but it's never really been an issue in practice for me.
I have definitely used the technique of defining non-overlapping sequences in different shards (with Oracle, not Postgres, but they are very similar in this regard). It worked very well and was easy to reason about.
As a developer, the big issue I have with UUIDs is that they are impossible to read or type. You have to copy/paste and it isn't easy to look at two UUIDs and tell if they are different.
I use integers in general unless the specific use case demands something else.