|
|
|
|
|
by nighthawk454
402 days ago
|
|
I guess it depends on how loosely we take the definition. Wikipedia has it as just a function that maps variable length sequences to fixed length sequences. So by that definition most embedding networks fit. Hashes are often assumed to be 1d, discrete valued, deterministic, uniformly distributed, and hard-to-reverse. And embeddings are often assumed to have semantic structure. Those two things certainly have some pretty different properties. In the strict definitions, I’d say if hashing is just mapping to a fixed-size output space and an embedding is a projection/mapping of one space onto another (usually smaller) space, then they’re similar. Some hash algorithms like SimHash or LSH use random projection onto sets of random hyperplanes to produce output vectors. Blurring the lines fairly well. You could even implement that as a NN with a single projection layer. Or indeed the torch.nn.Embedding class. Of course the outputs are usually then quantized or even binarized, but that’s more a use-case specific performance optimization not fundamental (and sometimes so are embeddings). |
|
The only similarity at all is that they're both an algorithm that maps from one domain to another. So your logic collapses into "All mapping functions are hashes, whenever the output domain is smaller than the input domain", which is obviously wrong. And it's additionally wrong because the output domain of a Semantic Vector is 1500 infinities (dimensions) larger than the input. So even as a "mapper" it's doing the inverse of what a hash does.