Hacker News new | ask | show | jobs
by panny 1847 days ago
It seems like int vs bigint is brushed off rather quickly here. bigint is twice the size of int, therefore indexing will be larger as well. Furthermore, all the FK storage and indexing will also be bloated by this choice. If you design a customer table with a bigint PK, and everything will point to customer (invoices, billing statements, etc), then that's not an insignificant amount of space. While most of us may want to have "billions served" like McDonald's, the reality is my company and your company will never have 2 billion customer accounts, even in the wildest of imaginations. If you ever did reach this point, it's "a good problem to have" and relatively easy to move from int -> bigint. Moving in the reverse direction is likely difficult or impossible.

It would be nice to see real benchmarking on millions of rows to compare the three, but my gut tells me you use int by default, bigint if you outgrow int, and UUID if you have plenty of money for hardware and need distribution capabilities a UUID would enable.

2 comments

In datamodelling, tables can often be categorized by lifetime. 'Business Relationships' eg. customers, suppliers, products have a fairly long lifetime; whereas 'Business Transactions' are created on a much higher frequency.

I'm generally fairly comfortable using int for business relationships, and bigint (long) for transaction data.

For performance, insertion speed often seems to be dominated by 'commit latency' to sync to the disk; rather than by record size. I would agree that record size affects table scan, but for many datamodels keying may often be a relatively small proportion compared to the size of text fields and other data.

I like to model keyspaces to work for 200 years, for the largest forseeable market growth, times at least a factor of 10 for safety.

Speaking from personal experience, just use bigint... If you aren't dealing with billions of rows, the size difference isn't that big a deal, and if you are dealing with billions of rows, the int -> bigint migration is definitely not "relatively easy".

One of the most memorable anecdotes of my professional career is a production environment going down because we hit maxint on an important (and busy) table. The dirty hack we used to get the site back up (hint: int is _signed_), and the weeks it took to plan, test, and execute the migration.