Hacker News new | ask | show | jobs
by IanCal 291 days ago
That's the one! It's not even that weird of a case compared to others but is an excellent example.

Here's the history of the Paris example: https://en.wikipedia.org/wiki/University_of_Paris where there was one, then many, then fewer universities. Answering a question of "what university is referred to by X" depends on why you want to know, there are multiple possible answers. Again it's not the weirdest one, but a good clear example of some issues.

There's a company called Merk, and a company called Merk. Merk is called Merk in the US but MSD outside of it. The other Merk is called Merk outside the US and EMD inside it. Technically one is Merk & Co and used to be part of Merk but later wasn't and due to trademark disputes, which aren't even all resolved yet.

This is an area I think LLMs actually have a space to step in, we have tried perfectly modelling everything so we can let computers which have no ability to manage ambiguity answer some questions. We have tried barely modelling anything and letting humans figure out the rest, as they're typically pretty poor at crafting the code, and that has issues. We ended up settling largely on spending a bunch of human time modelling some things, then other humans building tooling around them to answer specific questions by writing the code, and a third set who get to actually ask the questions.

LLMs can manage ambiguity, and they can also do more technical code based things. We haven't really historically had things that could manage ambiguity like this for arbitrary tasks without lots of expensive human time.

I am now wondering if anyone has done a graph db where the edges are embedding vectors rather than strict terms.

1 comments

> I am now wondering if anyone has done a graph db where the edges are embedding vectors rather than strict terms.

Curious: how would you imagine it working if there were such a graph db?

I had the idea a few hours ago so I'm sure there are holes in this but my first idea is forming a graph where the relationship isn't a fixed label but a description that is then embedded as a vector.

First of all, consider that in a way each edge label is a one-hot binary vector. And we search using only binary methods. A consequence is anything outside of that very narrow path all data is missed in a search. A simple step could be to change that to anything within an X similarity to some target vector. Could you then search "(fixed term) is a love interest of b?" and have b? filled from facts like "(fixed term) is intimate with Y" and "(fixed term) has a date with Z"?

There are probably issues, I'm sure there are, but some blend of querying but with some fuzziness feels potentially useful.

Isn't this exactly what neo4j does for graphrag?
Is that vectors for edges or for searching the nodes? I’m talking about encoding the edges as vectors for traversal.
Yes you can do that with neo4j.
Interesting, thanks, I'll have to explore that.