Hacker News new | ask | show | jobs
by soco 39 days ago
But can you actually get usable results from those embeddings, specially in multilanguage setups? My experience is the similarities they find are more random than not, and without building some (fckin expensive) ontology and graph search you're done for. Data set of one, trying to build a pipeline able to answer legal questions like "cases where self-defense was rejected" or "discussion about parental authority vs custody". The vector rag collects random results strong with either terms, but mostly without any link to the actual problem.

Edit: I didn't try query rewriting though, might have mitigated it a bit. But not hugely.