Hacker News new | ask | show | jobs
by SoftTalker 748 days ago
This is DBA 101 stuff. If a database is part of your software, you really need someone on the team who knows how it works.
4 comments

100% this, however my first boss shared some wisdom with me:

"The kind of people who recognize the value of expertise don't need to be told to look for it, the kind of people who don't recognize it can't be told anything."

You're not wrong, but unfortunately many teams don't.

Probably my favourite "ya'll don't understand how databases work" was where they "reserved" space for MySQL enums; for example for the "active" column it would be something like:

  enum(
      'active',
      'deleted',
      '_futureval1',
      '_futureval2',
      '_futureval3',
      '_futureval4',
      '_futureval5',
      '_futureval6',
      '_futureval7',
      '_futureval8',
      '_futureval9'
  )
Enums don't work like that at all; it's just a mapping of int to a string value for readability, and you don't need to "reserve" space to add future values just like you don't need to for ints. Adding a new enum value is easy and cheap. Removing them is not as it requires a full scan of all rows to verify they're used. Even worse, you couldn't easily rename enum labels (at the time, don't know if you can now), making it all worse than useless.

Since this was all on large tables and the effort to fix it was relatively large, but without adding much business value, we never fixed it. We were basically stuck with it. It sure as hell annoyed me every time I looked at it.

I'm not an DBA either, but spending about 5 seconds on the documentation for "enum" would have prevented this. This really doesn't require a PhD in SQL.

I'm not sure what the current state of things are since I haven't use MySQL recently but this used to be a perfectly valid thing to do.

The issue was that MySQL doesn't use a full int to store enums. If your enum has 8 values, it stores in 1 byte, if it has more than 8, it stores it in 2 bytes. Adding that 9th value thus requires re-writing the entire table. So yes - it can make sense to "reserve space" to avoid a future table re-write.

You also had to be careful to include `ALGORITHM=INPLACE, LOCK=NONE;` in your `ALTER TABLE` statement when changing the enum or it would lock the table and rewrite it.

> If your enum has 8 values, it stores in 1 byte, if it has more than 8, it stores it in 2 bytes.

You're probably thinking of the SET type, rather than the ENUM type.

> You also had to be careful to include `ALGORITHM=INPLACE, LOCK=NONE;` in your `ALTER TABLE` statement when changing the enum or it would lock the table and rewrite it.

This is a common misconception; that's not how ALGORITHM=INPLACE, LOCK=NONE works. An ALTER TABLE without ALGORITHM and LOCK will, by default, always use the least-disruptive method of alteration possible.

Adding those clauses in MySQL just tells the server "fail immediately if the requested method is not possible". The semantics in MariaDB are similar, just slightly different for INPLACE, where it means "fail immediately if the requested method or a better one is not possible".

> You're probably thinking of the SET type, rather than the ENUM type.

Ah oui, très Pascal.

You can store 255 values in one byte, and reserving two bytes is not what that did.

And even if I did, it still leaves the inability to actually rename enums without scanning the full table at least twice (which still doesn't seem possible in MariaDB, unless I missed something there).

If you potentially want great flexibility you shouldn't be using enums in the first place but int and a relational mapping to another table.

A byte would fit at least 255 different values, right? How often is this limit exceeded in practice.
I've witnessed companies react with delight at the results after spending millions on consultants when all the consultants did was a few hours of explain query and create index.

Not my problem when I see it, way to go consultants charging millions, but it's amazing how poorly big companies are run that this seriously happens.

You use a database? You have someone who knows how to add indexes right? Right?!

Yes, but consider the marketing, uh… I mean, messaging coming from the NoSQL and DBaaS (CosmosDb, DynamoDb, etc) and Firebase crowd: “databases are hard, let us manage it all for you” - and they’ve got a point: it’s 2024 now, we arguably shouldn’t need to handle those kinds of non-functional requirements by ourselves: a DB engine should be able to automatically infer the necessary indexes from the schema design, and automatically rebuild them asynchronously - if Google can search the web in under a second, then your RDBMS should have no problem querying your data in a fraction of the time.

…which I imagine is the impression made to a lot of (even most?) people who got started writing software-that-uses-a-database within the past decade. If you’re using NodeJS then using a KV-store library feels far more natural than writing SQL in a string. At least the kids are using parameters now, so it’s not like how every PHP+MySQL site was vulnerable to injection attacks…

(I know that recently RDBMS now do implement automatic indexes based on runtime query-profiling, which is great, but it isn’t pre-emptive, and often gets it wrong too)

———-

Also, who calls themselves a “DBA” anymore? That word makes me think of a pipe-and-suspenders type, still employed well-past retirement age because they’re the only ones who knows how to keep the company’s Big Iron (…or AS/400) database from keeling over. Thesedays it’s all “Ops” - “DevOps”, “SysOps”, …”DatabaseOps”?

Google doesn’t search the internet in under a second. They are doing a key-value lookup. Any time they change the search algorithm or add to their index they recompute every query ever searched and prepare the answers to be quickly recalled.