| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by sdenton4 1723 days ago

hm. I'd like to believe, but the arguments here seem a bit obtuse.

No one measures vector distance using the hamming distance on binary representations. That's silly. We use L1 or L2, usually, and the binary encoding of the numbers is irrelevant.

It sounds like the LSH is maaaaybe equivalent to vector quantization. In which case this would be a form of regularization, which sometimes works well, and sometimes meh.

3 comments

mish15 1723 days ago

I was part of the above article. Happy to answer questions.

In terms of accuracy, it totally depends on the resolution needed. We can get >99% accuracy of L2 waaaaay faster with 1/10 of the memory overhead. For what we are doing that is the perfect trade off.

In terms of LSH, we tried projection hashing and quantization and were always disappointed.

link

sdenton4 1723 days ago

So it seems like the neural network producing the neural hash is still a standard CNN operating on the usual vector representations? And then the learned hash gets used in a downstream problem...

Or is there actually some interesting hash-based neural algorithm lurking around somewhere?

link

mish15 1722 days ago

Yes and yes.

Network based hashing is great to maximise information quality of the hash (compared to other LSH methods). It works to compress existing vectors super efficiently.

Very soon things like language embeddings will skip the vectors and instead networks output hashes directly. These are much faster as the network can learn where to use more bits where it needs resolution, as opposed to using floatXX for everything. It’s amazing to see it work, but not fully there yet.

link

cellis 1723 days ago

Hello! First I would like to say this is a very cool writeup. I'm not a computer scientist but do dabble a bit in neural networks. Is it possible this could be used to build a convolutional neural network?

link

tomnipotent 1723 days ago

Goal is to end up with binary encoding so that Hamming distances approximate Euclidean nearest neighbors (so basically L2). Combine with quantized SIFT/GIST for images and you end up with a fast-and-dirty model with decent results.

link

onlyrealcuzzo 1723 days ago

How would these ml predictions be explainable?

link