| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by rm999 4865 days ago

I think this is actually quite different from random projection, which creates linear combinations of several continuous variables. RP is much more similar to PCA in that respect. OP's method works on a single categorical variable and essentially shuffles the possible values, then 'folds' them down to an arbitrary size (the size of your hashing function).

>ideally they'd be using a cryptographic-strength hash fn

Cryptographic adds nothing; there is no danger from someone reversing the hash (and if there is you may have bigger problems). Any hash function that is suitably random in its redistribution of the variables should suffice.

1 comments

fiatmoney 4865 days ago

It's a sparse random matrix, but definitely in the same family of techniques.

I was concerned more about distributional properties of different hash functions than reversibility. Checksums-as-hashes often work, but it's not that much work to swap in something a little more robust. If you want a good output distribution, pick something where that's a goal of the algorithm.

link