Hacker News new | ask | show | jobs
Affinity Is All You Need
1 points by PHOTON1233 794 days ago
Vector Databases are great. Semantically searching for information and advanced re-ranking have made incredible search engines like perplexity, cohere and Marqo possible. Although, they have a fundamental problem everyone seems to ignore. We're querying from unstructured data that may not always be semantically similar! Now stick with me here.

How it is right now: AI search is like trying to find you're favorite video game from a massive bargain bin. You approximately guess where it is, shove your hand in, and pray to god you pull out that game.

How it should be: Lets take a video game I'm trying to find as an example. I know a couple factors, its a fantasy | action | single-player | RPG | with a detailed character creation | a masterful modding community | and something about a Golden Claw? Hmm.

But i forgot the name of the game...(Some of you probably figured it out)

Vector databases will pull up all sorts of snippets of information querying for each relevant factor, and its the job of the llm to find similarities between these snippets at the exact time of inference. That's a lot of bargain bin diving for each query!

What we need to do is pre-processing of unstructured data. We need to create relationships (affinities) between our snippets of information that connect semantically disassociated pieces together (Something semantic search will fail). All we then do is retrieve which video game has an affinity to all our factors. We quite possible now have a way to retrieve targeted pieces of information with pin point accuracy.

Not just that, we can now work our way back to prove why the chosen video game is the best possible answer. We can even take slightly different directions to recommend similar video games!

We are now also guaranteed to retrieve ALL of the relevant information, rather than just the top n-documents within an embedding distance, possibly missing out on crucial tid-bits.

Pre-processing our unstructured data into an affinities format can leave us with MUCH faster inferencing and cheaper operating costs and we can get away with much lighter post-processing steps.

This is what I'm working on and hope to have a working MVP API anyone can use to convert their conversational LLMs, into truly personal assistants with dynamic long-term memory of your emotions, experiences, relationships, preferences, aspirations, and more. Keep a look out for Affinity AI.

P.S The game was Skyrim!