| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GGO 711 days ago
	I dont understand the recommendation of using bigserial with uuid column when you can use UUIDv7. I get that it made sense years ago when there was no UUIDv7, but why do people keep recommending it over UUIDv7 now beats me.

3 comments

whynotmaybe 711 days ago

As uuid v7 hold time information, they can help bad actors for timing attacks or pattern recognition because they contain a time information linked to the record.

You can guess the time the system took between 2 uuid v7 id's.

They can only be used if they're not shown to the user. (so not in the form mysite.com/mypage? id=0190854d-7f9f-78fc-b9bc-598867ebf39a)

A big serial starting at a high number can't provide the time information.

link

TeeWEE 711 days ago

Big serial is sequential and it’s very easy to guess the next number. So you got the problem of sequential key attack…

If you use only uuid in your outwards facing api then you still have the problem of slow queries. Since you need them to find the object (as mentioned below)

UUIDv7 has a random part, can be created distributedly, and indexes well.

It’s the best choice for modern application that support distributed data creation.

link

j45 710 days ago

Sequential numbers can not be used publicly.

Also, security can be built around not allowing querying records which are not yours.

I'm all for a little security through obscurity including UUIDs but it shouldn't be the sole thing. Easier to generate a UUID for the sequential and let the database do what it does best (relate many serials among each other).

The other part is being able to use what's built into the database out of the box without a lot more configuration.

Selfishly, I always appreciate learning more about Postgres though :)

link

badestrand 711 days ago

You never expose the bigserial, you generate a ID (like UUID) for external use/identification and simply have an index over that column for fast selects.

link

iterateoften 711 days ago

If you have an index on the uuid anyways having a separate big serial field for PK doesn’t help that much.

link

rezonant 711 days ago

As mentioned elsewhere, it ensures the ability to perform resumable and consistent batching queries across the data set without missing records.

Ordering over an insertion timestamp is not enough if two records may have the same timestamp: You may miss a record (or visit a record twice) across multiple queries.

link

disneycember 711 days ago

This is solved sorting by timestamp first then by random PK UUID. Don't think a little simpler batch queries justify leaking time and quantity information or complexity of handling two types of IDs.

link

kroolik 711 days ago

Having an index over the uuid is equivalent to it being a PK, so why would you bother having both?

link

blackenedgem 711 days ago

Because it's much better for range queries and joins. When you inevitably need to take a snapshot of the table or migrate the schema somehow you'll be wishing you had something else other than a UUID as the PK.

link

rezonant 711 days ago

This. Highly recommend using a numeric primary key + UUID. Using UUID relations internally can have some strategic advantages, but when UUIDv4 is used as the only primary key, you completely lose the ability to reliably iterate all records across multiple independent queries.

Also, the external thing isn't just for exposing it out to your own apps via APIs, but way more importantly for providing an unmistakable ID to store within external related systems. For example, in your Stripe metadata.

Doing this ensures that ID either exists in your own database or does not, regardless of database rollbacks, database inconsistencies etc. In those situations a numeric ID is a big question mark: Does this record correspond with the external system or was there a reuse of that ID?

I've been burnt taking over poorly managed systems that saved numeric IDs externally, and in trying to heal and migrate that data, ran into tons of problems because of ill-considered rollbacks of the database. At least after I leave the systems I build won't be subtly broken by such bad practices in the future.

link

groestl 711 days ago

Ha? Please elaborate.

link

stoperaticless 711 days ago

Amen (or similar)

link

thiht 711 days ago

I don’t understand how that’s an issue. Do you have an example of a possible attack using UUIDv7 timestamp? Is there evidence of this being a real security flaw?

link

whynotmaybe 711 days ago

The draft spec for uuid v7 has details about the security considerations : https://www.ietf.org/archive/id/draft-peabody-dispatch-new-u...

The way I see it is that uuid v7 in itself is great for some use but not for all uses.

You always have to remember that a v7 always carries the id's creation time as metadata with it, whether you want it or not. And if you let external users get the v7, they can get that metadata.

I'm not a security expert but I know enough to know that you should only give the minimal data to a user.

My only guess is that v7 being so new, attacks aren't widespread for now, and I know why the author decided not to focus on "if UUID is the right format for a key", because the answer is no 99% of the time.

link

thiht 711 days ago

That just seems overly cautious. I’d rather use UUIDv7 unless I have a reason not to. The convenience of sortable ids and increased index locality are very much worth the security issues associated with UUIDv7. Maybe I wouldn’t use UUIDv7 for tokens or stuff like that, but DB IDs seem pretty safe.

link

timmytokyo 710 days ago

I don't get it either. If UUIDv7 lacks security due to its revelation of a timestamp, why don't bigserials also lack security? After all, given a bigserial ID, you can tell whether it was generated before or after some other bigserial ID and thereby infer something about the time it was generated.

link

j45 710 days ago

BigSerials come from a relational database world for me.

The use of UUIDs for documents may come from a no-sql background.

I use bigserials for relational data in relational databases, and if there is a unique document value needed, a UUID is good.

link

cqqxo4zV46cp 711 days ago

I don’t understand this thinking. If you understand what’s at play, you can infer the potential security implications. What you’re advocating for is being entirely reactive instead of also being proactive.

link

thiht 711 days ago

No, I don’t. Even with a timestamp uuids are not enumerable, and honestly I don’t care that the timestamp they were created at is public. Is the version of uuid used being a part of the uuid considered a leak too?

link

arp242 711 days ago

For almost all use cases just showing a UUIDv7 or sequential ID is fine. There are a few exceptions, but it's not the common case.

link

mewpmewp2 711 days ago

How would it be fine, e.g. for e commerce which is arguably very large portion of the use cases?

You would be immediately leaking how many orders a day your business is getting with sequential id.

link

arp242 711 days ago

> You would be immediately leaking how many orders a day your business is getting with sequential id.

Which is fine for almost all of them. All brick and mortar stores "leak" this too; it's really not that hard to guess number of orders for most businesses, and it's not really a problem for the overwhelming majority.

And "Hi, this is Martin, I'd like to ask a question about order 2bf8aa01-6f4e-42ae-8635-9648f70a9a05" doesn't really work. Neither does "John, did you already pay order 2bf8aa01-6f4e-42ae-8635-9648f70a9a05" or "Alice, isn't 2bf8aa01-6f4e-42ae-8635-9648f70a9a05 the same as what we ordered with 7bb027c3-83ea-481a-bb1e-861be18d21ea?"

Especially for order IDs UUIDs are huge PITA because unlike user IDs and other more "internal" IDs, people can and do want to talk about them. You will need some secondary human-friendly unique ID regardless (possibly obfuscated, if you really want to), and if you have that, then why bother giving UUIDs to people?

link

badestrand 711 days ago

Best solution is to have a serial identifier internally and a generated ID for external. And yes it shouldn't be a UUID as they are user-hostile, it should be something like 6-10 letters+digits.

link

inopinatus 711 days ago

There are jurisdictions e.g. Germany in which a consecutive sequence for invoice numbers is a mandatory, legislated requirement (mercifully, gaps are generally permitted, with caveats)

For extra spice, in some places this is legislated as a per-seller sequence, and in others as a per-customer sequence, so there’s no policy you can apply globally, and this once again highlights the separation of concerns between a primary key and a record locator/identifier.

link

wodenokoto 711 days ago

> consecutive sequence for invoice numbers is a mandatory, legislated requirement (mercifully, gaps are generally permitted, with caveats)

That’s surprising. In Denmark gaps are not allowed. You have to account for all invoices and if you have an invoice numbered 50, then you have at least 50 invoices to account for.

link

withinboredom 711 days ago

It's nice when you change invoicing software, to be able to have gaps. For example, before Stripe is invoice <500 and Stripe invoices have >500. This makes it simple for humans to determine where an invoice may be located during the transition year. Further, it means we can plan the entire switch-over in advance, vs. only knowing the invoice number AFTER the switch-over. This makes a huge difference in internal communications to customer support, because you can let them know how things will operate once the switch is done. If you can't have gaps, you won't know how to tell customer support where to find new/old invoices until after the switch.

link

JodieBenitez 711 days ago

> There are jurisdictions e.g. Germany in which a consecutive sequence for invoice numbers is a mandatory

Same in France. I thought it was a pretty common requirement.

link

OskarS 711 days ago

Can I ask (as a humble application developer, not a backend/database person), if the two requirements are:

1. The UUIDs should be ordered internally, for B-tree performance

2. The UUIDs should not be ordered externally, for security reasons

Why not use encryption? The unencrypted ID is a sequential id, but as soon as it leaves the database, it's always encrypted. Like, when getting it out:

    SELECT encrypt(id) FROM table WHERE something = whatever;

and when putting stuff in:

    UPDATE table SET something = whatever WHERE id = decrypt(<encrypted-key>)

Seems like the best of both worlds, and you don't need to store separate things.

link

fauigerzigerk 711 days ago

>Why not use encryption?

Because then you have a key management problem, which adds complexity.

link

spencerap 711 days ago

If the key and encryption mechanism are ever leaked, those opaque external IDs can be converted easily back to sequence numbers, and vice versa, which might pose a risk for you or your users. You won't be able to rotate the encryption key without breaking anything external that tracks those encrypted IDs... third party services, SEO, user bookmarks, etc.

link

OskarS 711 days ago

You store the key in the database, right? Like, if the database leaks, it doesn’t matter if your ids are sequeneced or unsequenced, because all data has leaked anyway. The key leaking doesn’t seem like a realistic security issue.

link

zxexz 711 days ago

Ideally if you do this, you store the key in a separate schema with proper roles so that you can call encrypt() with the database role, which can't select the key. Even then, the decrypted metadata should not be particularly sensitive - and should immutably reference a point in time so you can validate against some known key revocation retroactively.

My take is it's rarely necessary to have a token, that you give to an external entity, that has any embedded metadata all - 99.9% of apps aren't operating at a scale where even a million-key hashmap sitting in ram and syncing changes to disk on update would cause any performance difference.

link

OskarS 711 days ago

This is a very weird thread: half the people are arguing that having these timestamps is not a realistic security problem at all, and the other half is arguing that any fix to it has to have Fort Knox level security policies.

It seems to me: the actual value of knowing these ids/timestamps to a hacker is tiny, but it's not nothing (German tank problem and all that). Like, if a hacker was able to decode the timestamps, it's not ideal, but it's not like a catastrophe either (especially given that half the people in this thread thinks it has no security value at all). Given that threat model, a simple scheme like I suggested seems fine to me.

link

stoperaticless 711 days ago

> The key leaking doesn’t seem like a realistic security issue.

But it is.

If you have a password in a system, you want to rotate it regularly or at least have that ability (for example, when angry colleague leaves).

link

sebazzz 711 days ago

> As uuid v7 hold time information, they can help bad actors for timing attacks or pattern recognition because they contain a time information linked to the record.

Are you then not doing security by randomness if that is the thing that worries you?

link

dajtxx 711 days ago

The comment above warns against it due to the embedded timestamp info as a info leak risk. Perhaps that was a problem for them in some circumstance.

link

inopinatus 711 days ago

It wasn’t a problem for me directly but was observed and related by a colleague: an identifier for an acquired entity embedded the record’s creation timestamp and effectively leaked the date of acquisition despite it being commercial-in-confidence information. Cue post-M&A ruckus at board level.

Just goes to show you can’t inadvertently disclose anything these days.

link

mixmastamyk 711 days ago

You're saving storage space but potentially leaking details. Is that ok for your application? No one can answer but your org.

link

hoffs 711 days ago

The details part is so miniscule that I doubt it even matters. You'd have difficult time trying to enumerate uuidv7s anyways.

link

mixmastamyk 711 days ago

Leaking time leaks information about customer growth and usage. It may matter to your competitors.

link