Hacker News new | ask | show | jobs
by sgarland 619 days ago
I am slowly and sadly learning this lesson. My customers are devs, and I want so badly for them to care as deeply as I do about RDBMS optimization, normalization, and referential integrity, but by and large it’s a fool’s errand.

What has worked to some extent is patiently walking them down the path of what happens if those things don’t happen, e.g. “so no one uses lookup tables or enums, so now every row has X unnecessary bytes, which puts pressure on the buffer pool, and then everyone’s queries slow down, and everyone’s SLOs trend down…”

It doesn’t always work; I’ve also had people respond that “they’ll fix it later,” (lolol sure) but it’s had better results than simply explaining why their schema is technically sub-optimal.

The absolute worst to deal with have been those who seem to completely lack empathy, and respond flatly with, “fixing that isn’t on our roadmap, and isn’t likely to be,” even when I explain that in X months, my team will be suffering from their decisions.

2 comments

Maybe you already do, but I've had this exact challenge in the past as well and what has worked best is to send them some SQL queries that do it the "right" way. Often times I think they just don't want to deal with the problem: it's hard, it's uncomfortable, and nobody "up the chain" will care about it. There's plenty of reason not to want to do it.

But giving them the query, or writing the migration for them, often takes care of both one and two. I've even seen this approach ignite a passion for query optimization as it "clicked" for them!

I do this when I can, certainly, but there are only so many hours in a day, and I have larger tasks to deal with much of the time.

What I’d love is for devs to treat SQL the same as they would their primary language, instead of some mysterious and arcane artifact to be abstracted away and ignored. If I refused to use any data structure other than a dict / hashmap because “it’s good enough,” how do you think that would go over?

Of the items you listed, I only care about referential integrity (or rather, I don't want things to be wrong). Unless we are adding millions of rows on a daily basis, microoptimizations don't make a meaningful impact.

Of all the optimizations I have meassured (and I have meassured a fair number), only two types have really moved the needle: do less, and use a better algorithm.

If we need to shave a few percentage of the database access, it is more cost effective for us to get a more powerful database server. Assuming that we already have a cache and aren't doing something stupid like not using an index.

> Unless we are adding millions of rows on a daily basis, microoptimizations don't make a meaningful impact.

At toy scale (if your entire dataset can easily fit into memory on a small instance; probably < 100,000,000 rows total), then some things don’t matter as much. I disagree that they aren’t meaningful, though. Especially with Aurora, and its “what if we added even more latency to queries” model of splitting compute and storage. Go make a table with a few million rows; one column with a sequential ID, the other with UUIDv4. Index them both (in MySQL, you’d need to swap PKs between tests since it clusters), then do any kind of aggregation query, like COUNT, MAX, etc. That’s not a small difference.

A micro-optimization for an RDBMS would be something like switching a MySQL column from VARCHAR to CHAR if you’re positive that the length is fixed, to save the 1-2 bytes/row overhead. It matters at scale (thousands of tables * millions of rows can add up), but not for most.

> If we need to shave a few percentage of the database access, it is more cost effective for us to get a more powerful database server.

People always say this as though it’s fact, but I’ve yet to see any studies on it, and frankly I doubt its veracity. Take the current RDS pricing for on-demand Postgres, r7g.large vs. r7g.xlarge: in us-east-1, it’s $0.239/hr vs. $0.478/hr. That difference works out to about $2000/yr, for a single-AZ DB of the smallest non-burstable type they have. Since microservices remain all the rage, multiply that by N. You’re telling me it’s not worth the money to spend a few hours reading docs and applying the knowledge? I don’t buy it. Moreover, as someone who enjoys the craft of SWE, it’s physically painful to watch people say “eh, let’s just throw money at it” instead of understanding where their code is slow, and solving it.