Hacker News new | ask | show | jobs
by throwawayboise 1841 days ago
I don't think a lot of the argument that integer IDs reveal too much.

Yes, they are guessable but your application should not rely solely on the "secrecy" of the ID to authorize access to a record. If you are worried about someone crawling your public API with wget or curl and an incrementing counter you should re-think whether your data are really public or not, or maybe rate-limit anonymous users, etc.

They also reveal something about the total number of records in your database, I guess that could matter in some contexts but it's never really been an issue in practice for me.

I have definitely used the technique of defining non-overlapping sequences in different shards (with Oracle, not Postgres, but they are very similar in this regard). It worked very well and was easy to reason about.

As a developer, the big issue I have with UUIDs is that they are impossible to read or type. You have to copy/paste and it isn't easy to look at two UUIDs and tell if they are different.

I use integers in general unless the specific use case demands something else.

4 comments

> Yes, they are guessable but your application should not rely solely on the "secrecy" of the ID to authorize access to a record

Any information you give to a potentially malicious actor can help them attack you. If you have a choice between leaking information and not leaking information, I can’t imagine why you would ever intentionally go with the former, unless you didn’t actually have a choice (feasibility, etc.).

As an example, maybe I needed to CSRF someone but needed their ID (say, in a b2b app where IDs are not generally public) - with sequential IDs I have a decent chance of cracking it with a small number of requests, especially if it’s a small userbase. Sure, the CSRF was the main issue, but this is a contrived example to illustrate the point.

Admittedly, IDs are oftentimes public information by necessity - but there’s no need to allow them to leak extra information.

As soon as those IDs are used by any other people or business processes in any way whatsoever, their usability starts to matter and arguably in most cases is simply more relevant than a minor hypothetical advantage to an attacker.

For example, if some customer ID is used by customers in communication e.g. when calling you on the phone over some billing issue, then there would be strong advantages if your IDs aren't unnecessarily long and if they include some explicit redundancy (e.g. a check-digit with Luhn formula) to protect against communication mistakes.

For another example, if your IDs aren't visible to outsiders but are used in your internal business processes then it may be quite valuable to ensure that IDs of different types (e.g. customer ID vs account ID vs transaction ID) are obviously distinguishable in some way so that someone seeing XXXXXXX knows that it likely is a transaction ID and definitely can't be a customer ID; and it's quite valuable to ensure that you can't have accidental collisions where the same number is a valid ID in different key tables so a bug or miscommunication that confuses them would result in data corruption or information disclosure instead of simply failing.

So "ID design" deserves some attention from UX perspective and blocks of random data aren't optimal UX.

So we run surveys among general and specialized audiences (among other things), and these surveys link to custom scripting, images, videos, etc. The URLs have to be freely accessible, but if they are sequential, anyone can simply try to guess what's in other surveys, potentially getting information about their competitors.
This is an example where you don't need a UUID as the key (since you could have another field that stores this "secret" value), but it makes it very convenient if you do use UUID as primary key by default because you get that "secret" value for free (no need to create another column and index). In my projects I use it by default for all models. It comes in handy. Another use case is needing to know the primary key before inserting into the database (at either the front end or the backend, but typically the backend).
True, the record contains both a classical sequential id and a uuid (to maintain backwards compatability), but now everything is linked through the uuid instead of the id. Convenient, indeed. And there's never much data associated with a single uuid, so performance is not an issue.
I had this issue in a case I think is interesting; a customer had a database with incremental IDs of a certain product they sold. On a web platform, the product owner in turn could log in and view a list of their products and their status. The id of the product was part of the URL; /product/851. Of course, the product owners could not get any information on IDs they didn’t own, but the numbers gave away info on how many devices existed before them. And they wanted to hide that information.

Of course, there are many ways to solve that situation, but UUIDs is one.

It's the german tank problem.

Serial IDs, with some light assumptions, leak information about the total count of items.

Just pick a random number at the beginning, and start incrementing IDs from there. Like personal checks starting at 1000 so they're always(ish) 4 digit. Of course, maybe pick another starting number that's less obvious.
That's not effective at all.
Still leaks count -- you do similar stuff to estimate the minimum and the maximum.
From a security perspective, you're right.

They can inherently leak how much data you do or don't have, which you may not want your competitors to know.