Hacker News new | ask | show | jobs
Affinity Is All You Need
1 points by PHOTON1233 796 days ago
Vector Databases are great. Semantically searching for information and advanced re-ranking have made incredible search engines like perplexity, cohere and Marqo possible. Although, they have a fundamental problem everyone seems to ignore. We're querying from unstructured data that may not always be semantically similar! Now stick with me here. How it is right now: AI search is like trying to find you're favorite video game from a massive bargain bin. You approximately guess where it is, shove your hand in, and pray to god you pull out that game.

How it should be: Lets take a video game I'm trying to find as an example. I know a couple factors, its a fantasy | action | single-player | RPG | with a detailed character creation | a masterful modding community | and something about a Golden Claw? Hmm.

But i forgot the name of the game...(Some of you probably figured it out)

Vector databases will pull up all sorts of snippets of information querying for each relevant factor, and its the job of the llm to find similarities between these snippets at the exact time of inference. That's a lot of bargain bin diving for each query!

What we need to do is pre-processing of unstructured data. We need to create relationships (affinities) between our snippets of information that connect semantically disassociated pieces together (Something semantic search will fail). All we then do is retrieve which video game has an affinity to all our factors. We quite possible now have a way to retrieve targeted pieces of information with pin point accuracy.

Not just that, we can now work our way back to prove why the chosen video game is the best possible answer. We can even take slightly different directions to recommend similar video games!

We are now also guaranteed to retrieve ALL of the relevant information, rather than just the top n-documents within an embedding distance, possibly missing out on crucial tid-bits.

Pre-processing our unstructured data into an affinities format can leave us with MUCH faster inferencing and cheaper operating costs and we can get away with much lighter post-processing steps.

This is what I'm working on and hope to have a working MVP API anyone can use to convert their conversational LLMs, into truly personal assistants with dynamic long-term memory of your emotions, experiences, relationships, preferences, aspirations, and more. Keep a look out for Affinity AI.

P.S The game was Skyrim!

1 comments

So you want to transform the problem into: keyword search on graphs. Each of your affinities match some node in the graph and the problem is to find the relations and intermediary nodes which connect the answer node to the given affinities.
ITs very much a Knowledge graph that builds upon itself as the affinities (node and relationship triples) pile up from user interactions, like for example. User: Sam and I are thinking of making our relationship official.... Affinities picked up: Girlfriend---OFFFICIALY_DATING--->Sam.

Keywords are soo 2022, semantic similarity is still employed to find pre-existing nodes and their relationships (affinities), and if not, new affinities are made, expanding the user's personal KG.

Its fun to build and maybe something can come out of it. Thanks for the comment!

I thought so. It is just that "keyword search on graphs" is the name of the search problem even if something other than keyword matching is used to pick which nodes in the graph to find a minimal connecting graph and a central result node from.
Thats pretty much it. There's also building on top of an existing graph and making sure new nodes are coherent and well connected to surrounding nodes which is a much tougher can of worms. Also, Navigating a KG (without multiple llm-hoping) is another challenge that needs to be solved for 'working-out retrieval'. Something that can navigate a forest of dividing paths without resorting to an llm call. Thanks for the name "keyword search on graphs". I found some great articles on the topic.