| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by antender 1888 days ago
	Also, from the point of query optimisation this is a really bad idea. Usually you DO actually care about size of fields in SQL databases, because something like BOOLEAN is usually stored as single byte (or bit in a bitfield) vs 4 bytes or even 8 in case of timestamp. This not only multiplies on disk usage by at least 4 times, but also makes ALL indexes using this field way bigger. Also boolean indexes can be compressed (or stored as bitmaps), while timestamp indexes contain lots of unique values, so they can't be. This is also the reason why serial IDs are way better than UUIDs for internal IDs.

4 comments

traceroute66 1888 days ago

> This is also the reason why serial IDs are way better than UUIDs for internal IDs.

There are three core problems with that:

  a) Serial IDs are a nightmare for database merges, clustering or anything like that
  b) Serial IDs won't scale
  c) Serial IDs require management, whilst UUIDs can be produced anywhere (in DB, in frontend etc)

There is the KSUID[1] if people want a time-sortable thing that is near-enough to a UUID.

[1] https://github.com/segmentio/ksuid

link

hk1337 1888 days ago

You can do both. Use the UUIDs where you need the scalability and serial where you don’t.

link

marcosdumay 1888 days ago

> Serial IDs won't scale

You mean scaling into several machines? Yes, they do scale. Nothing requires that the values are always increasing and have no holes, so you can slice and cluster them at will (and many DBMS do exactly that).

link

gmfawcett 1888 days ago

I'm confused? Serial means "always increasing and having no holes" -- it's one thing after the other. What you're arguing for sounds like just IDs, not serial IDs.

link

marcosdumay 1888 days ago

Well, by your definition what people use as serial IDs isn't serial.

Databases have the concept of sequences, that are closer to your definition, but make no promises about holes (they normally don't generate holes by themselves, but there is no way to guarantee you won't lose numbers upon usage). It is common to use sequences to feed serial IDs, but not all DBMS do that and it's not a requirement in any way.

link

twodave 1888 days ago

IMO if you’re going this route, may as well just implement a snowflake clone and get the best of all worlds.

link

traceroute66 1888 days ago

Re: Snowflake clone I would humbly invite you to read the note in the KSUID Github, namely:

  *To fit into a 64-bit number space, Snowflake IDs and its derivatives require coordination to avoid collisions, which significantly increases the deployment complexity and operational burden.*

Therefore KSUID remains the best option.

link

twodave 1886 days ago

I don't necessarily agree with your conclusion, but I will admit you can't put KSUID and Snowflake in a room and declare either of them the winner.

I would argue that doubling your key-space in the case of KSUIDs may have just as much of an impact as coordinating node ids in the case of snowflake (and in fact the snowflake technique only runs into coordination problems when you're at pretty extreme scale).

link

forkerenok 1888 days ago

I think this is still workable with an expression/functional index, where the indexed expression is "happened_at IS NOT NULL".

link

isuckatcoding 1888 days ago

Seems like a whole lot of extra trouble to be honest. What happens when you need to create a composite index?

link

forkerenok 1888 days ago

You can do indexes that are composites of plain field values and/or expressions! At least in Postgres.

As for the trouble, I agree, it's a bit more involved.

link

RedShift1 1888 days ago

Storing both boolean and timestamp with the correct constraints seems like a better solution to me.

link

karmakaze 1888 days ago

This is the typical solution for MySQL. There are indexable virtual generated columns but don't seem to have been used/around enough to be battle tested.

Using just the nullable timestamp only lets you index on one of the boolean states: an index on (ts, col2) works for WHERE ts IS NULL ORDER BY col2, but WHERE ts IS NOT NULL ORDER BY col2 doesn't work well (without using generated boolean, etc).

link

MayeulC 1888 days ago

Theoretically, this should compress quite well. In practice, I don't know how much compression db engines apply.

link