| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by marcus_holmes 2087 days ago
	I never get this. Why are you sorting by ID?

2 comments

munawwar 2087 days ago

Don't know about the OP, but there are some rarer cases, like dynamodb, where more indexes means more $, you can avoid creating a timestamp column & new index by having a sortable id column.

But even without the need to sort by id, one of the advantage is that sequential ids makes it easier for databases to fetch data, as data stored ends up being more sequential in disk as well.

More reading on that:

- https://eager.io/blog/how-long-does-an-id-need-to-be/#locali...

- https://www.percona.com/blog/2019/11/22/uuids-are-popular-bu...

link

marcus_holmes 2087 days ago

Well, for some databases, yes, this is "clustering", and there's a command to deal with it ([0] is postgres, which doesn't do this particularly well, other flavours of database have similar commands).

But it's the usual thing with databases: if you're trying to optimise performance by hacking the engine, then your schema probably needs a good hard looking at.

[0] https://www.postgresql.org/docs/current/sql-cluster.html

link

lazulicurio 2087 days ago

I don't consider being aware of your data distribution to be "hacking the engine". It's part of a good DBA's job. For any data set of appreciable size you can't treat the data as a black box. Even if you're not clustering the underlying table in postgres the performance of your indices will be heavily affected by data layout.

link

marcus_holmes 2086 days ago

I agree. Hacking the ID field to include index data so the storage clustering is a bit more performant is not "treating the data as a black box" to my mind.

link

Vanderson 2087 days ago

In my case, I don't really need it, but what I read there seems to be a DB performance boost by having it. And finally, if I wanted it down the road, it's already there.

Default sorting out of the box is kind of nice anyways.

link