You need to look at other stuff than performance - relevancy is probably the biggest thing when implementing search. Is it more relevant than what you experienced with ES?
I'd argue that relevancy is more your application application's design then the underlying system retrieving the results. For example, putting the same dataset in Postgres or ES wouldn't make one deliver more relevant results given equal configurations.
You could lean on the relevancy strategies built in to ES, but in my experience you're better off understanding what relevancy means for your dataset and implementing a strategy yourself. Your millage may vary though, I'd never advocate reimplenting something that's already provided by your chosen tool.
The options and tools for configuring and tweaking relevancy between ES and PostgreSQL's FTS are surprisingly similar for many application use cases. If you're interested you can check out Postgres' search rank and query weighting configurations.
I've had to build complex queries against ElasticSearch and it is specifically designed for things like this. We had custom weightings so when you searched for certain natural keys associated with each item they would rank above everything else, and that is easily do-able with ES. Simultaneously, we would weigh results according to various metadata we had attached to each entry (audio stream languages, subtitles, content owner name, genre, etc.). And finally, if you searched for the name of the media (a movie or an episode in a TV show) the user would see all the matches ranked accordingly, but again weighed according to the content owner and various language features of that media file.
You can probably hack that together with PostgreSQL, but is basically one big query in ES. PG's FTS is still great; but its use-case is slightly different.
That’s cool and I hear you; that’s a complex relevancy definition.
( Maybe surprisingly? ) This is type of query is natively supported by Postgres. That support is robust and mature, you don’t have to hack it together.
ES is a great tool and it’s clear your a fan of it. If you’re interested, I’d recommend you look into Postgres capabilities. It’s not a replacement for ES by any means ( or even a competitor to in my opinion; Postgres isn’t even distributed ). But for specific use cases, you might find that Postgres capabilities surprise you!
(I did leave out some bits that made it more complicated than I indicated.) Also, this was years ago; I know PG has improved its FTS a lot since then. ES is just a useful tool. If you can express your problem in relational terms then a RDBMS is almost always the right choice.
By the way, I am a huge fan of PG and relational databases in general; PG, especially, is a great database, and the first tool I reach for when it comes to data storage. However, we had other requirements (aside from the complexity I left out) to do with versioning and so forth that swung in favour of ES. Ultimately the problem with FTS in RDBMS, for me, boils down to doing FTS across disparate -- let's call them 'documents' -- stored across multiple tables. Basically you have to use materialised views (with manual refreshing) or complex join mechanics that affect performance. Perhaps PG 10 has improved in this area also?
I was about to give him the same answer but you beat me to it. It’s not jus throwing stuff in a field in PG. You can weight multiple bits of data. You can even define multiple different search vectors in separate columns if you want to use different search styles in different situations.
Thanks! I replied to the OP; but weighing, although critical to our needs, was a small part of a larger problem space we had to solve. I hate introducing new technology unless it's strictly required, but we ran into limitations that forced us to down the road of using ES.
> You could lean on the relevancy strategies built in to ES, but in my experience you're better off understanding what relevancy means for your dataset and implementing a strategy yourself.
Ranking is hard. You SHOULD lean on the tools available in Lucene/Solr/ES. PG's ranking tools are a joke in comparison.
> The options and tools for configuring and tweaking relevancy between ES and PostgreSQL's FTS are surprisingly similar for many application use cases.
I don't the sense of the article is that PG FTS is better than ES , but in some situation, as the one I illustrated in my article, you can implement a the same search function with both of them, but with if have PG already in your stack configuring and using it with Django is very simple and convenient.
> For example, putting the same dataset in Postgres or ES wouldn't make one deliver more relevant results given equal configurations.
This is not the case.
The OOTB search capabilities in ElasticSearch (even by default) far, far exceed what you get in PostgreSQL FTS.
Also you're completely contradicting yourself. You say you don't advocate reimplmenting something provided by the tool but then suggest doing exactly that.
But in many situation you don't need all the ES features and you can implement a quite good and fast FTS function directly with your PostgreSQL database if you already use it in your stack.
> The options and tools for configuring and tweaking relevancy between ES and PostgreSQL's FTS are surprisingly similar for many application use cases.
Maybe in very very limited scenarios, but in general, they aren't even close. PostgreSQL doesn't take corpus frequencies into account, which makes it pretty difficult to come anywhere near the relevance ranking quality of Elasticsearch (or any proper search engine).
I second this. You can't let your queries take unusable amounts of time, but below a certain threshold relevancy is infinitely more important.
I'm putting together a product which has a search feature and that uses Django + MySQL and I'm struggling with relevancy. I'd happily accept 500ms queries if that guaranteed me the relevant hit would be on the first page. That's FAR more usable than 50ms queries and then the relevant hit is on page 5.
Full text search in MySQL isn’t in the same ballpark as PG. Thats not a dig at MySQL, just praise for the quality of what you get from the PG implementation.
Second this, even if you are a PG fanboy and a search newbie, you need to pay attention to:
1. issues with i18n and l10n tokenization. Does PG support other languages?
2. At minimum you need to support tf-idf (or something better), it doesn't look like PG supports this either.
3. For extremely dumb ranking, you can have a render/engaged column in PG. For decent production stuff you need a decision tree ranker (or GBDT).
All in all, none of these are there in PG, I'm not familiar with Solr/Lucene either, but please educate yourselves before expressing such strong opinions marketed as the absolute truth.
PG FTS support other languages https://www.postgresql.org/docs/current/static/textsearch-ps...
Anyway the point of my article is not that PG FST is better than ES, but that for a quite good and fast FTS function you can use only Django and PostgreSQL and most of the time you don't need all the other ES features and at the same time your stack will be easier to build and maintain.
I think search relevancy is very important, and I wrote in my article start using PG FTS had permitted to work on search relevancy because I had more time which I used before in ES configuration and maintain another layer in my stack.
You could lean on the relevancy strategies built in to ES, but in my experience you're better off understanding what relevancy means for your dataset and implementing a strategy yourself. Your millage may vary though, I'd never advocate reimplenting something that's already provided by your chosen tool.
The options and tools for configuring and tweaking relevancy between ES and PostgreSQL's FTS are surprisingly similar for many application use cases. If you're interested you can check out Postgres' search rank and query weighting configurations.