| HN Mirror

This is what I was trying to get at - using column vectors gives good cache locality and lets you use SIMD for "multiply all of these by this scalar" for each column, and then for "sum all of these" for the resulting rows. I'd imagine it could also let you optimize multiplications into things like bit-shifts with minimal overhead as well, though I have no idea if that's done in practice. Maybe only tangentially related, but I feel like this talk on Halide[0] is really illustrative of the general concepts.

As others have mentioned, for some operations it can also save you from loading whole columns that aren't relevant for your transformation. The compression point in the sibling comment is definitely also relevant, especially for serialization. A whole lot of reasons to use column vectors.

Using "column-major" here might've been terminology abuse; sorry for the confusion.

[0] https://www.youtube.com/watch?v=3uiEyEKji0M