| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by jillesvangurp 823 days ago

> not to mention its ElasticSearch grade full-text search capabilities.

I played with postgresql a while ago to implement search. It's not horrible. But it's nowhere near Elasticsearch in terms of its capabilities. It's adequate for implementing very narrow use cases where search ranking really doesn't matter much (i.e. your revenue is not really impacted by poor precision and recall metrics). If your revenue does depend on that (e.g. because people buy stuff that they find on your website), you should be a bit more careful about monitoring your search performance and using the right tools to improve performance.

But for everything else you only have a handful of tools to work with to tune things. And what little there is is hard to use and kind of clunky. Great if that really is all you need and you know what you are doing but if you've used Elasticsearch and know how to use it properly you'll find your self missing quite a few things. Maybe some of those things will get added over time but for now it simply does not give you a lot to work with.

That being said, if you go down that path the trigram support in postgres is actually quite useful for implementing simple search. I went for that after trying the very clunky tsvector support and finding it very underwhelming for even the simplest of use cases. Trigrams are easier to deal with in postgres and you can implement some half decent ranking with it. Great for searching across product ids, names, and other short strings.

2 comments

Vonng 823 days ago

That's true for the kernel, How about extensions such as ParadeDB BM25 https://www.paradedb.com/ + PGroonga https://pgroonga.github.io/ + PG Bigm https://github.com/pgbigm/pg_bigm ?

link

kiwicopple 823 days ago

also with pg_trgm[0] (mentioned by OP) and pgvector for semantic search you have a pretty powerful search toolkit. for example, combining them for Hybrid Search [1]

[0] https://www.postgresql.org/docs/current/pgtrgm.html

[1] Reciprocal Ranked Fusion: https://supabase.com/docs/guides/ai/hybrid-search

link

mistrial9 823 days ago

the difference is cached/pre-calc results in a (big on disk, expensive to compute) inverted index.. you cannot beat that at runtime

link

jillesvangurp 823 days ago

Still very limited and frankly all a bit low level primitives. Unless you are a search expert, you won't be able to do much productive with this stuff. If you are, it might fit a few use cases. But then, why limit yourself to just this stuff?

The point of that of course being that the target audience for this stuff is actually people that for whatever reason are a bit shy using the right tools for the right job here and are probably lacking a lot of expertise. The intersection of people with the expertise that would be happy with this narrow subset of functionality is just not a lot of people.

link

hans_castorp 823 days ago

> I played with postgresql a while ago to implement search. It's not horrible. But it's nowhere near Elasticsearch in terms of its capabilities.

The article is referring to the ParadeDB extension, not the built-in full text search

link