| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by PhilipA 3081 days ago
	You need to look at other stuff than performance - relevancy is probably the biggest thing when implementing search. Is it more relevant than what you experienced with ES?

4 comments

agconti 3081 days ago

I'd argue that relevancy is more your application application's design then the underlying system retrieving the results. For example, putting the same dataset in Postgres or ES wouldn't make one deliver more relevant results given equal configurations.

You could lean on the relevancy strategies built in to ES, but in my experience you're better off understanding what relevancy means for your dataset and implementing a strategy yourself. Your millage may vary though, I'd never advocate reimplenting something that's already provided by your chosen tool.

The options and tools for configuring and tweaking relevancy between ES and PostgreSQL's FTS are surprisingly similar for many application use cases. If you're interested you can check out Postgres' search rank and query weighting configurations.

mickeyp 3081 days ago

I disagree.

I've had to build complex queries against ElasticSearch and it is specifically designed for things like this. We had custom weightings so when you searched for certain natural keys associated with each item they would rank above everything else, and that is easily do-able with ES. Simultaneously, we would weigh results according to various metadata we had attached to each entry (audio stream languages, subtitles, content owner name, genre, etc.). And finally, if you searched for the name of the media (a movie or an episode in a TV show) the user would see all the matches ranked accordingly, but again weighed according to the content owner and various language features of that media file.

You can probably hack that together with PostgreSQL, but is basically one big query in ES. PG's FTS is still great; but its use-case is slightly different.

agconti 3081 days ago

That’s cool and I hear you; that’s a complex relevancy definition.

( Maybe surprisingly? ) This is type of query is natively supported by Postgres. That support is robust and mature, you don’t have to hack it together.

ES is a great tool and it’s clear your a fan of it. If you’re interested, I’d recommend you look into Postgres capabilities. It’s not a replacement for ES by any means ( or even a competitor to in my opinion; Postgres isn’t even distributed ). But for specific use cases, you might find that Postgres capabilities surprise you!

mickeyp 3081 days ago

(I did leave out some bits that made it more complicated than I indicated.) Also, this was years ago; I know PG has improved its FTS a lot since then. ES is just a useful tool. If you can express your problem in relational terms then a RDBMS is almost always the right choice.

By the way, I am a huge fan of PG and relational databases in general; PG, especially, is a great database, and the first tool I reach for when it comes to data storage. However, we had other requirements (aside from the complexity I left out) to do with versioning and so forth that swung in favour of ES. Ultimately the problem with FTS in RDBMS, for me, boils down to doing FTS across disparate -- let's call them 'documents' -- stored across multiple tables. Basically you have to use materialised views (with manual refreshing) or complex join mechanics that affect performance. Perhaps PG 10 has improved in this area also?

pauloxnet 3080 days ago

In my article the "document" contain data from different table and i stored it in specific column and it's very fast with a GIN index on it.

brightball 3081 days ago

I was about to give him the same answer but you beat me to it. It’s not jus throwing stuff in a field in PG. You can weight multiple bits of data. You can even define multiple different search vectors in separate columns if you want to use different search styles in different situations.

mickeyp 3081 days ago

Thanks! I replied to the OP; but weighing, although critical to our needs, was a small part of a larger problem space we had to solve. I hate introducing new technology unless it's strictly required, but we ran into limitations that forced us to down the road of using ES.

pauloxnet 3081 days ago

This is true.

pauloxnet 3081 days ago

Thanks for your reply , my opinion is th same

rpedela 3081 days ago

> You could lean on the relevancy strategies built in to ES, but in my experience you're better off understanding what relevancy means for your dataset and implementing a strategy yourself.

Ranking is hard. You SHOULD lean on the tools available in Lucene/Solr/ES. PG's ranking tools are a joke in comparison.

> The options and tools for configuring and tweaking relevancy between ES and PostgreSQL's FTS are surprisingly similar for many application use cases.

That simply isn't true.

pauloxnet 3081 days ago

I don't the sense of the article is that PG FTS is better than ES , but in some situation, as the one I illustrated in my article, you can implement a the same search function with both of them, but with if have PG already in your stack configuring and using it with Django is very simple and convenient.

threeseed 3081 days ago

> For example, putting the same dataset in Postgres or ES wouldn't make one deliver more relevant results given equal configurations.

This is not the case.

The OOTB search capabilities in ElasticSearch (even by default) far, far exceed what you get in PostgreSQL FTS.

Also you're completely contradicting yourself. You say you don't advocate reimplmenting something provided by the tool but then suggest doing exactly that.

pauloxnet 3081 days ago

But in many situation you don't need all the ES features and you can implement a quite good and fast FTS function directly with your PostgreSQL database if you already use it in your stack.

burntsushi 3081 days ago

> The options and tools for configuring and tweaking relevancy between ES and PostgreSQL's FTS are surprisingly similar for many application use cases.

Maybe in very very limited scenarios, but in general, they aren't even close. PostgreSQL doesn't take corpus frequencies into account, which makes it pretty difficult to come anywhere near the relevance ranking quality of Elasticsearch (or any proper search engine).

In order to tell whether PG vs Elastic is appropriate for your use case, you need to do an evaluation. See: https://en.wikipedia.org/wiki/Text_Retrieval_Conference

RasputinsBro 3081 days ago

I second this. You can't let your queries take unusable amounts of time, but below a certain threshold relevancy is infinitely more important.

I'm putting together a product which has a search feature and that uses Django + MySQL and I'm struggling with relevancy. I'd happily accept 500ms queries if that guaranteed me the relevant hit would be on the first page. That's FAR more usable than 50ms queries and then the relevant hit is on page 5.

brightball 3081 days ago

Full text search in MySQL isn’t in the same ballpark as PG. Thats not a dig at MySQL, just praise for the quality of what you get from the PG implementation.

orf 3081 days ago

Why mysql? Search in pg is waaay better.

pauloxnet 3081 days ago

Yes I think you need both of them of course and I found it on my project with Django and PostgreSQL.

innagadadavida 3081 days ago

Second this, even if you are a PG fanboy and a search newbie, you need to pay attention to:

1. issues with i18n and l10n tokenization. Does PG support other languages?

2. At minimum you need to support tf-idf (or something better), it doesn't look like PG supports this either.

3. For extremely dumb ranking, you can have a render/engaged column in PG. For decent production stuff you need a decision tree ranker (or GBDT).

All in all, none of these are there in PG, I'm not familiar with Solr/Lucene either, but please educate yourselves before expressing such strong opinions marketed as the absolute truth.

pauloxnet 3081 days ago

PG FTS support other languages https://www.postgresql.org/docs/current/static/textsearch-ps... Anyway the point of my article is not that PG FST is better than ES, but that for a quite good and fast FTS function you can use only Django and PostgreSQL and most of the time you don't need all the other ES features and at the same time your stack will be easier to build and maintain.

pauloxnet 3081 days ago

I think search relevancy is very important, and I wrote in my article start using PG FTS had permitted to work on search relevancy because I had more time which I used before in ES configuration and maintain another layer in my stack.