Hacker News new | ask | show | jobs
by BenoitP 1877 days ago
I don't know about linear algebra, but column major lets you compress thus:

* Dictionary encoding: US,US,US,US,FR -> US:0,FR:1;0,0,0,0,1

* Run-length encoding: 0,0,0,0,1 -> 4x0,1x1

* Delta encoding: 0,1,2,3,4 -> 5x'+1'

* Storing the min and max for a chunk

Basically: exploit the data type to compress it.

Which enables very fast filtering and projections. (And now that the IO bottleneck has been managed you can do your gigantic logistic regression)