Hacker News new | ask | show | jobs
by 6gvONxR4sf7o 1877 days ago
> linear-algebra-like transformations

> To do this, you generally want your data in column-major format

I'd argue that the basic element of linear algebra is matrix vector multiplication, which I figured was best done row-major. Column major is great in other data use cases, but 'linear-algebra-like, therefore column major' doesn't feel right.

3 comments

I don't know about linear algebra, but column major lets you compress thus:

* Dictionary encoding: US,US,US,US,FR -> US:0,FR:1;0,0,0,0,1

* Run-length encoding: 0,0,0,0,1 -> 4x0,1x1

* Delta encoding: 0,1,2,3,4 -> 5x'+1'

* Storing the min and max for a chunk

Basically: exploit the data type to compress it.

Which enables very fast filtering and projections. (And now that the IO bottleneck has been managed you can do your gigantic logistic regression)

It sounds like you're thinking about the mat-vec operation in terms of "Grab one row of the matrix, take the dot-product with the vector, and repeat for each row of the matrix."

But it's also possible to think of it as "Grab one element of the vector, use it to scale the corresponding col of the matrix, and repeat, summing results." Both are efficient means of finding the result, and both have block-level versions that play nicely with the machine cache.

Meanwhile, linear algebra also often involves finding vector norms, and scaling vectors, and so on, and the way we usually set up tables means that the vectors of interest are generally columns of the data tables.

This is what I was trying to get at - using column vectors gives good cache locality and lets you use SIMD for "multiply all of these by this scalar" for each column, and then for "sum all of these" for the resulting rows. I'd imagine it could also let you optimize multiplications into things like bit-shifts with minimal overhead as well, though I have no idea if that's done in practice. Maybe only tangentially related, but I feel like this talk on Halide[0] is really illustrative of the general concepts.

As others have mentioned, for some operations it can also save you from loading whole columns that aren't relevant for your transformation. The compression point in the sibling comment is definitely also relevant, especially for serialization. A whole lot of reasons to use column vectors.

Using "column-major" here might've been terminology abuse; sorry for the confusion.

[0] https://www.youtube.com/watch?v=3uiEyEKji0M

"column" here refers to a type of data. let's say you have a bunch of records of purchases. one column would be price, another column would be quantity.

if you're doing a linear algebra like transformation, you want to do it on all the prices or all the quantities, and a linear algebra library expects a big array of numbers, which is why you have to transform your records into an array of prices and an array of quantities.

"column" here refers to properties of objects, and not rows vs columns with in an array of number