|
|
|
|
|
by 6gvONxR4sf7o
1877 days ago
|
|
> linear-algebra-like transformations > To do this, you generally want your data in column-major format I'd argue that the basic element of linear algebra is matrix vector multiplication, which I figured was best done row-major. Column major is great in other data use cases, but 'linear-algebra-like, therefore column major' doesn't feel right. |
|
* Dictionary encoding: US,US,US,US,FR -> US:0,FR:1;0,0,0,0,1
* Run-length encoding: 0,0,0,0,1 -> 4x0,1x1
* Delta encoding: 0,1,2,3,4 -> 5x'+1'
* Storing the min and max for a chunk
Basically: exploit the data type to compress it.
Which enables very fast filtering and projections. (And now that the IO bottleneck has been managed you can do your gigantic logistic regression)