Hacker News new | ask | show | jobs
by remus 615 days ago
Very interesting, but I think the author overstates the importance of alignment a little. Unless your data/indexes are already of a challenging size (or you expect them to be imminently) for your hardware then fiddling with byte alignment details feels like a fairly premature optimisation.

Disk is cheap, memory is plentiful, your time is expensive etc.

4 comments

As mentioned in the article, it's a good idea to consider this when creating a new table, since it's essentially a free optimization. However, it's probably not worth the hassle of reordering a production table for that.

> Disk is cheap, memory is plentiful, but your time is expensive, etc.

One thing to keep in mind, though, is that while you often have plenty of disk space, RAM is still relatively expensive. It's also divided into many smaller buffers, such as working memory and shared buffers, which are not that large. These optimizations help to fit more data into cache.

However, what the article said about alignment being important for indexes is somewhat misleading. Reordering an index field is not the same as reordering columns in a table. Beside having to rewrite queries, it also changes the access pattern and the time required to access the data, which is often much more significant than the space saved. Indexes are, by nature, a tradeoff where you give up space to gain time, so this mindset doesn't really apply there.

Hey, author here.

> Indexes are, by nature, a tradeoff where you give up space to gain time, so this mindset doesn't really apply there.

I agree that (re)aligning indexes are a different beast entirely, but (as mentioned in my recommendation) ideally the developer should keep this in mind when creating the index initially.

Factors like cardinality and even readability should take precedence over perfect alignment, but all else being equal, aligning your indexes from the very moment they are introduced in the codebase is the ideal scenario IMO.

> Disk is cheap, memory is plentiful, your time is expensive etc.

Spend 30 minutes one day playing around with Postgres, trying different column combinations out. Boom, you now know how best to order columns. This doesn’t seem like a big ask.

The flip side is that changing data at scale is HARD, so if you put things like this off, when you do finally need to squeeze bytes, it’s painful.

Also, memory is absolutely not plentiful. That’s generally the biggest bottleneck (or rather, the lack of it then makes IO the bottleneck) for an RDBMS, assuming you have connection pooling and aren’t saturating the CPU with overhead.

> Disk is cheap, memory is plentiful, your time is expensive etc.

Taking the time to know the in memory sizing for your data types is well worth it. Taking the time to think about the types to use and sorting them by size is also minimal and well worth it.

It may make sense for the system to do it automatically for newly created tables. But maybe not as it’s possible you’d want the data layout to match some existing structure.

> Disk is cheap, memory is plentiful, your time is expensive etc.

Index size is not solely a storage concern. I also don't really care about how much disk space I pay for, but sometimes I care a lot about how long it takes to vacuum a table.