Hacker News new | ask | show | jobs
by freediver 1555 days ago
Hey all - creator here. It looks like next page of results does not work currently because wrong query param (should be "q" instead of "topics"). Easy enough to manually change if you need it.

As a few of you noticed, narrow searches do not work very well because this is not a general web search engine and has a tiny index compared to Google. Use Teclis to discover more about a broader topic you are interested in and to discover writing from 'clean' websites on the web.

Looking forward to feedback to improve!

4 comments

> As a few of you noticed, narrow searches do not work very well because this is not a general web search engine and has a tiny index compared to Google. Use Teclis to discover more about a broader topic you are interested in and to discover writing from 'clean' websites on the web.

Are you getting better results with vector search?

I've been looking at this problem with my search engine as well. I've recently side-loaded all of stackoverflow and stackexchange, and searching in that part of the index is still not great at finding narrow results like you can on bigger search engines, when that reasonably speaking should be possible.

I think, beyond the fact that my index is DIY and fairly crude, algorithms like BM25 are designed to identify topical keywords, and they do that rather well, but narrow searches go far beyond merely the topic and often involve words that aren't important to the document but are important to some particular context within it.

I may have some ideas to get around this, but they're fairly half baked. Experiments are needed.

Not OP but I am working on a search engine with vector ranking. Why do you say that vector search would help with narrow queries? In my experience, semantic search helps broaden the query to search for adjacent ideas without exact term marches.

Hybrid approaches that use vector search for broad matches and rerank using BM25 could be what you’re looking for. See https://blog.vespa.ai/efficient-open-domain-question-answeri...

> "Hybrid approaches that use vector search for broad matches and rerank using BM25"

Hybrid approaches, e.g. Learning To Rank, normally do it the other way around, given the main benefit of hybrid is to mitigate the cost (time) of vector search, i.e. use a non-vector search (e.g. BM25) to get a broadly relevant set of results first (and quickly), and then the much more computationally expensive vector search to rerank the smaller results set. There are various approaches to try to make vector search more viable across large corpuses, e.g. Locality Sensitive Hashing and Approximate Nearest Neighbour Search, but if you've implemented one of those than I'm not sure there'd be any benefit in retaining a hybrid approach.

> Why do you say that vector search would help with narrow queries?

I was just asking whether he'd seen better results. I haven't experimented very much with it on my search engine. It's as crude as they get, and in part I want to see how far I can push old fashioned 1970s search algorithms :P

Vector search is good for broad searches. Narrow searching is a problem of crawling, not ranking IMO. Teclis crawls a very particular and small portion of the web, which is the main reason it can not find results for more specific searches.
Thanks for making Kagi! I hope you and your team can figure out a way to make a flat monthly fee feasible so I can continue using the site!
I was a little surprised to see Fandom.com results come up in one of my test searches, given that they are notorious for being very far from "clean" (I counted 25+ uBO blocked when checking the page in Vivaldi, which is far above the threshold of 5 mentioned on your page). Might be worth looking at in more detail.

Also, Marginalia Search link on front page is broken.

Teclis is the name of a High elf wizard in Warhammer (a miniature fantasy strategy game). Is that where the name comes from?
Yes, although I was more of a Wood elf player in WHFB.