| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by norwalkbear 1404 days ago
	Has anyone overcome the 16382 positional limits of tsvector? That and the automatic stemming and lemming of search words even in phrase searches makes postgres awful for any software where accurate search is critical.

4 comments

hombre_fatal 1404 days ago

I remember when I ran into that first issue on my forum, I felt like I was the first person to ever use Postgres full-text search. (8?) Years ago, googling the exact error brought up nothing except some dev email/listserv chatter. Nobody else was indexing longer text documents? Wasn't very encouraging.

Oh well. The vast majority of forum posts don't hit that limit so I just excluded the exceptionally long posts from the index. I never revisited it again.

link

jmull 1404 days ago

I don’t know ab out the positional limits, but the built-in stemming and other transformations are a default, which you can change. I think this is the relevant mechanism: https://www.postgresql.org/docs/current/textsearch-dictionar...

link

iav 1404 days ago

I used a trigger function to detect long text and trim the source before it got indexed. That meant that any text over 500kb just got dropped from the index. I also used one index per long text field rather than combining with other fields.

link

manigandham 1404 days ago

The lack of BM25 relevance scoring is the bigger problem. Postgres FTS is fine as a better "LIKE" filter with very little overhead, but it's a poor choice for serious search applications or scale.

link