Hacker News new | ask | show | jobs
by fzliu 1349 days ago
Hashes are fine, but to say that "vectors are over" is just plain nonsense. We continue to see vectors as a core part of production systems for entity representation and recommendation (example: https://slack.engineering/recommend-api) and within models themselves (example: multimodal and diffusion models). For folks into metrics, we're building a vector database specifically for storing, indexing, and searching across massive quantities of vectors (https://github.com/milvus-io/milvus), and we've seen close to exponential growth in terms of total downloads.

Vectors are just getting started.

5 comments

True. The title is just clickbait and what we find inside is suggestions for dimensionality reduction by a person who appears to be on the verge of reinventing autoencoders disguised as neural hashes. Is it a mere coincidence that the article fails to mention autoencoders?
Click-bait title aside : ^ ), I'd agree. Neural hashes seem to be a promising advancement imo, but I question its impact on the convergence time of AI models. In the pecking order of neural network bottlenecks, I'd imagine it's not terribly expensive to access training data from some database. Rather, hardware considerations for improving parallelism seem to be the biggest hurdle [1].

[1] - https://www.nvidia.com/en-us/data-center/nvlink/

Yes this is funny to read when (a) embeddings are such a huge leap in reusable machine learning investment and (b) almost nobody is using them yet. On the other hand, neural hashes do look similar to the density tree analysis that is the first step in many of our applications of language embeddings. It makes sense to me that some of this might be incorporated into vector dbs in the near future. Do you have plans to?
Frequently people use vectors as a hash. It's a bit like a fashionista declaring clothes obsolete.
For searching on faces I also needed to find vectors in a database.

I used random projection hashing to increase the search speed because you can just match directly (or at least narrow down search) instead of calculating the euclidean distance for each row.