|
|
|
|
|
by rm999
4818 days ago
|
|
I think this is actually quite different from random projection, which creates linear combinations of several continuous variables. RP is much more similar to PCA in that respect. OP's method works on a single categorical variable and essentially shuffles the possible values, then 'folds' them down to an arbitrary size (the size of your hashing function). >ideally they'd be using a cryptographic-strength hash fn Cryptographic adds nothing; there is no danger from someone reversing the hash (and if there is you may have bigger problems). Any hash function that is suitably random in its redistribution of the variables should suffice. |
|
I was concerned more about distributional properties of different hash functions than reversibility. Checksums-as-hashes often work, but it's not that much work to swap in something a little more robust. If you want a good output distribution, pick something where that's a goal of the algorithm.