|
|
|
|
|
by maxnov
4057 days ago
|
|
Thanks for the feedback! I tried out your suggestion by running the following: SELECT id, meta->>'title' as title, meta FROM(
SELECT id, meta FROM (
SELECT id, meta, tsv
FROM data_rows, plainto_tsquery('search query') AS q
WHERE (tsv @@ q)
) AS t1 ORDER BY ts_rank_cd(t1.tsv, plainto_tsquery('search query')) DESC LIMIT 5
) AS t2;
Please let me know if this isn't what you meant. The above did improve the speed but only a little. I think for our use case this approach wouldn't be entirely necessary but maybe if we had a larger data set then it'd be worth implementing. |
|
Note that you don't need to call plainto_tsquery twice, you can use the one you've already declared.
And I notice you are returning the JSON document whole, have you checked whether it's faster to extract the title here or later in whatever your programming language is?
The giveaway clue in your post btw is this: "We experimented with returning the document text with results and found that if we returned the full text of documents, it added about 350ms to the query time which seems to be network overhead rather than a slower query.".
As soon as you notice a linear slowdown, when you're only returning 5 rows whatever happens... then internally you must be doing something in the query that relates to all rows.
PostgreSQL fulltext scales nicely, I use it on a platform driving over 300 web sites with 10s of millions of comments. I originally used it thinking it will buy me time before I have to add a dedicated search engine, and it helps now as I store permissions knowledge in the database too... but truth is, it's just not been an issue yet. I'm not needing to consider anything else as it's performing great. The slowest part of my query isn't fulltext search, it's my permissions stuff (because I've gone down the rabbit hole of users, groups, roles, etc and this is calculated for all content), but still the 95th percentile for searches is typically under 10ms.