Hacker News new | ask | show | jobs
by rpcope1 1102 days ago
After having suffered through the consequences of "type" enums on MySQL, and see some things go through a long life that used "enums" in the database (in multiple different databases, include Postgres), I'm not convinced that either of these are the right choice for representing enumerations. The string with check constraint seems dumb if for no other reason than if the table that uses it winds up having many rows, you're basically burning up lots of extra space for arguably no reason, and if you ever have to alter the name of an enum (or something similar), that update is going to be really expensive.

I think the "right" choice for enums probably looks a little more like:

   CREATE TABLE myEnum(
     enumID SERIAL NOT NULL PRIMARY KEY,
     enumName TEXT UNIQUE NOT NULL,
     -- enum description or other metadata columns here.
   );

  CREATE TABLE foo(
    ...
    associatedEnumID INTEGER NOT NULL REFERENCES myEnum(enumID),
    ...
  );
I think this has the benefit of being space efficient like the native typed enum, while being relatively flexible (easy to change names, add new enum values, add data about the enum itself, etc.)
5 comments

I prefer directly using strings as enums, and using the foreign key constraint only to validate enum values.

    CREATE TABLE my_enum ( name TEXT PRIMARY KEY );

    CREATE TABLE foo ( my_enum TEXT REFERENCES my_enum (name) );
The reason is because a SELECT * FROM foo showing cryptic enum ordinals is a headache, and having to join the enum table every time is potentially slower than just reading from the column directly. An ASCII character only takes 1 byte, so an INT enum is just as space efficient as using 4 characters, which affords way more descriptiveness than a meaningless ordinal number.
If you use longer named enums(eg. my_enum_xyz) in my_enum, does this store a full copy of the enum text bytes of 'my_enum_xyz' into table foo?
Yeah the full text of the enum is stored in the table ('my_enum_xyz' is 11 ASCII characters so it takes up 11 bytes, plus 1 byte needed to store the length of the string).
The point of enums is so the names can be descriptive, which typically will be longer than 4 characters.
Perhaps I was unclear, my underlying point is that short strings are cheap. And enum names are almost always short strings. 4 characters not enough? What about 8? That’s the size of a BIGINT, which everybody now uses for primary keys (they’re not slow). What about 16 characters? That’s the size of a UUID, which everyone now uses for primary keys (they’re not slow).

Even 4 characters (which perform just as fast as Postgres’ native enums) is enough if you develop a system of abbreviations. The airline industry gets by with just three characters!

Just wanted to let you know I really appreciate your comments. I hadn't thought about doing it the way you recommend (in my mind I was thinking "oh, storing enum values as text will be inefficient", but I never did the math as you did to realize storing anything up to 8 chars is no worse than bigint), but I think it's a pretty optimal solution. I also checked my DB where we heavily use enums, and basically every enum text value that is longer than 8 characters could easily be shortened/abbreviated to 8 chars and still be very easily understood. Thanks very much!
Yep thats what I typically do except a string "code" and string "description" that way the raw data is more readable without joining the coded value and many times you want a short code vs long description, like with US states and their abbreviations. Most codes are 1 or 2 characters up to maybe 4 so you end up with less or the same space used than a 32 bit int with UTF-8.

Metadata like sorting and even what date ranges the code is valid for and even security as in who's allowed to use the code in the app.

This is... exactly what PG does under the covers for ENUM types. And also of course this is historically the canonical way to do ENUMs in SQL.
This is exactly what I do, save with natural keys.
In my experience, you need those descriptions and other metadata on these enum values more often than not.