Hacker News new | ask | show | jobs
by brycelarkin 710 days ago
For postgres, you want to use “bigint generated always as identity” instead of bigserial.
3 comments

I agree and apologise for writing bigserial out of ossified habit. Identity columns hew to the SQL standard, bigserial is a pg-ism. I actually do still use it when the ORM’s DDL generator prefers as much (looking at you Rails), there’s no gain from fighting the middleware.

For other readers: the two are almost the same in behaviour, they differ primarily in that identity columns don’t need a USAGE grant on their associated sequence, and the ALWAYS makes it harder (but despite the term not actually impossible) to override the default value on row insert.

Why?
Why “bigint generated always as identity” instead of bigserial, instead of Postgres' uuid data type?

Postgres' UUID datatype: https://www.postgresql.org/docs/current/datatype-uuid.html#D...

django.db.models.fields.UUIDField: https://docs.djangoproject.com/en/5.0/ref/models/fields/#uui... :

> class UUIDField: A field for storing universally unique identifiers. Uses Python’s UUID class. When used on PostgreSQL and MariaDB 10.7+, this stores in a uuid datatype, otherwise in a char(32)

> [...] Lookups on PostgreSQL and MariaDB 10.7+: Using iexact, contains, icontains, startswith, istartswith, endswith, or iendswith lookups on PostgreSQL don’t work for values without hyphens, because PostgreSQL and MariaDB 10.7+ store them in a hyphenated uuid datatype type.

From the sqlalachemy.types.Uuid docs: https://docs.sqlalchemy.org/en/20/core/type_basics.html#sqla... :

> Represent a database agnostic UUID datatype.

> For backends that have no “native” UUID datatype, the value will make use of CHAR(32) and store the UUID as a 32-character alphanumeric hex string.

> For backends which are known to support UUID directly or a similar uuid-storing datatype such as SQL Server’s UNIQUEIDENTIFIER, a “native” mode enabled by default allows these types will be used on those backends.

> In its default mode of use, the Uuid datatype expects Python uuid objects, from the Python uuid module

From the docs for the uuid Python module: https://docs.python.org/3/library/uuid.html :

> class uuid.SafeUUID: Added in version 3.7.

> safe: The UUID was generated by the platform in a multiprocessing-safe way

And there's not yet a uuid.uuid7() in the uuid Python module.

UUIDv7 leaks timing information ( https://news.ycombinator.com/item?id=40886496 ); which is ironic because uuids are usually used to avoid the "guess an autoincrement integer key" issue

Just noting, the commenter you replied to said:

> use “bigint generated always as identity” instead of bigserial.

The commenter you are replying to was not saying anything about whether to use UUIDs or not; they just said "if you are going to use bigserial, you should use bigint generated always as identity instead".

The question is the same; why would you use bigint instead of the native UUID type?

Why does OT compare text and UUID instead of char(32) and UUID?

What advantage would there be for database abstraction libraries like SQLalchemy and Django to implement the UUID type with bigint or bigserial instead of the native pg UUID type?

Best practice in Postgres is to use always use the text data type and combine it with check constraints when you need an exact length or max length.

See: https://wiki.postgresql.org/wiki/Don't_Do_This#Text_storage

Also, I think you're misunderstanding the article. They aren't talking about storing a uuid in a bigint. They're talking about have two different id's. An incrementing bigint is used internally within the db for PK and FK's. A separate uuid is used as an external identifier that's exposed by your API.

What needs to be stored as text if there is a native uuid type?

Chapter 8. Data Types > Table 8.2. Numeric Types: https://www.postgresql.org/docs/current/datatype-numeric.htm... :

> bigint: -9223372036854775808 to +9223372036854775807

> bigserial: 1 to 9223372036854775807

2*63 == 9223372036854775807

Todo UUID /? postgres bigint UUID: https://www.google.com/search?q=postgres+bigint+uuid :

- UUIDs are 128 bits, and they're unsigned, so: 2*127

- "UUID vs Bigint Battle!!! | Scaling Postgres 302" https://www.scalingpostgres.com/episodes/302-uuid-vs-bigint-...

"Reddit's photo albums broke due to Integer overflow of Signed Int32" https://news.ycombinator.com/item?id=33976355#33977924 re: IPv6 addresses having 64+64=128 bits

FWIW networkx has an in-memory Graph.relabel_nodes() method that assigns ints to unique node names in order to reduce RAM utilization for graph algorithms: https://networkx.org/documentation/stable/reference/generate...

Many people store UUID's as text in the database. Needles to say, this is bad. TFA starts by proposing that it's bad, then does some tests to show why.

I'm not quite sure what all the links have to do with the topic at hand.