Hacker News new | ask | show | jobs
by simonw 1144 days ago
PostgreSQL FTS is mostly great - I wrote a tutorial on using it to build faceted search with Django a few years ago: https://simonwillison.net/2017/Oct/5/django-postgresql-facet...

It does have one surprising limitation: it calculates relevance based on just the current row, rather than being able to take statistics across the whole corpus into account.

Most search engines use TF/IDF or BM25 for relevance calculations, which consider the relative common-ness of terms in comparison to the rest of the corpus. PostgreSQL FTS can't do that as far as I know.

SQLite's built-in FTS CAN do relevance calculations like this! Surprising to see a feature as significant as that show up in SQLite but not in PostgreSQL.

2 comments

I hope someone implements BM25 and combines it with Pgvector to bring hybrid search to Postgres. I feel like that is the jsonb of the next couple of years.
> Most search engines use TF/IDF or BM25 for relevance calculations ...

Ahhh. That's what's SQLite's "bm25()" function is for.

Was white listing the SQLite FTS functions a few days ago for DBHub.io (eg so people can use them), but don't really understand (yet) how most of them are actually used. Some day I'll get around to learning about them properly. :)