|
I don't know about linear algebra, but column major lets you compress thus: * Dictionary encoding: US,US,US,US,FR -> US:0,FR:1;0,0,0,0,1 * Run-length encoding: 0,0,0,0,1 -> 4x0,1x1 * Delta encoding: 0,1,2,3,4 -> 5x'+1' * Storing the min and max for a chunk Basically: exploit the data type to compress it. Which enables very fast filtering and projections. (And now that the IO bottleneck has been managed you can do your gigantic logistic regression) |