Hacker News new | ask | show | jobs
by marginalia_nu 1555 days ago
> As a few of you noticed, narrow searches do not work very well because this is not a general web search engine and has a tiny index compared to Google. Use Teclis to discover more about a broader topic you are interested in and to discover writing from 'clean' websites on the web.

Are you getting better results with vector search?

I've been looking at this problem with my search engine as well. I've recently side-loaded all of stackoverflow and stackexchange, and searching in that part of the index is still not great at finding narrow results like you can on bigger search engines, when that reasonably speaking should be possible.

I think, beyond the fact that my index is DIY and fairly crude, algorithms like BM25 are designed to identify topical keywords, and they do that rather well, but narrow searches go far beyond merely the topic and often involve words that aren't important to the document but are important to some particular context within it.

I may have some ideas to get around this, but they're fairly half baked. Experiments are needed.

2 comments

Not OP but I am working on a search engine with vector ranking. Why do you say that vector search would help with narrow queries? In my experience, semantic search helps broaden the query to search for adjacent ideas without exact term marches.

Hybrid approaches that use vector search for broad matches and rerank using BM25 could be what you’re looking for. See https://blog.vespa.ai/efficient-open-domain-question-answeri...

> "Hybrid approaches that use vector search for broad matches and rerank using BM25"

Hybrid approaches, e.g. Learning To Rank, normally do it the other way around, given the main benefit of hybrid is to mitigate the cost (time) of vector search, i.e. use a non-vector search (e.g. BM25) to get a broadly relevant set of results first (and quickly), and then the much more computationally expensive vector search to rerank the smaller results set. There are various approaches to try to make vector search more viable across large corpuses, e.g. Locality Sensitive Hashing and Approximate Nearest Neighbour Search, but if you've implemented one of those than I'm not sure there'd be any benefit in retaining a hybrid approach.

> Why do you say that vector search would help with narrow queries?

I was just asking whether he'd seen better results. I haven't experimented very much with it on my search engine. It's as crude as they get, and in part I want to see how far I can push old fashioned 1970s search algorithms :P

Vector search is good for broad searches. Narrow searching is a problem of crawling, not ranking IMO. Teclis crawls a very particular and small portion of the web, which is the main reason it can not find results for more specific searches.