Hacker News new | ask | show | jobs
by jfbaro 1433 days ago
I have seen this presentation from a company that used PG Full Text search for a pretty complex use case. Interesting -> http://matheusoliveira.s3-website-us-east-1.amazonaws.com/pr... (updated thanks to ddevault)
2 comments

Updated! Thanks
Good link thinks. Have to say we're moving to pgSearch after implementing SOLR alongside a PG/Rails backend. Makes the stack simpler, less components to worry about (less headaches for gem versioning dependencies with SOLR). Lot to be said for it after reading through what's available now...
are you planning to do relevance ? this is one of the blockers for us - TFIDF vs BM25 or something.

From what i know - PG fulltext seaarch does not implement relevance.

An alternative is to just layer the FTS on top of vanilla sql to get the extra stuff (this is what I do for my eCommerce backend), so is pretty simple to have something alike:

  SELECT ..
    -- Get the fts 
    IN ( FTS QUERY)
  ORDER BY
     -- The relevance is hardcoded? mayber in another table that store th rankings?
     (
      Products, 
      Inventory,
      Invoices,..
      )
I found is much easier and predictable if I code the "rankings" based on the business logic instead of let the FTS engine guess it. You can store that stuff as normal columns or use the "sources" (ie: products, inventory) as ways to know what could be more important to pull first.

This have the nice property that our search results are ver good and better: Never return non-sensical stuff! (like searching for a apple in the store and get and blog post!)

i think we are talking a little bit different. it doesnt matter what the variables are (your business logic) or some other variables.

Given a certain variable, tfidf/bm-25 will order by relevance and not by match. So it answers the question, what if the name match was off by two characters and the inventory number is less than 200.

tf-idf does not tell you what to order by...but it takes care of all the edge cases of ordering.

now if ur not using text match anywhere and only using business variables...then this entire thread is not for u. But FTS and lucene attack full text search primarily, and that's where the relevance vs ordering discussion comes from

This has been my understanding of the state of Postgres full-text search. It's great if your search requirements are fairly vanilla, but I haven't seen any solutions for more advanced search needs, such boosting, relevance, scoring, etc.
ts_rank functions use the term frequency within that document. not a global term frequency (which is why u need a separate index like what elasticsearch does).

this is important, cos if a word is too common, its considered less significant for a document match. When we calculate IDF, it will be very low for the most occurring words such as stop words (“is” is present in almost all of the documents, and tf-idf will give a very low value to that word).

there's someone who implemented this, its pretty cool. but definitely performance takes a hit versus a separate elasticsearch cluster. https://codebots.com/crud/How-to-efficiently-search-text-usi...