Hacker News new | ask | show | jobs
by gutensearch 1965 days ago
Thanks for the suggestions! I will try them next.

> Looking at your where clause, did you index language and textsearchable_index_col together ? If not, you are giving the planner the opportunity to skip your full text index.

Here is the index:

  update gutenberg.paragraphs
  set textsearchable_index_col = to_tsvector(b.cfgname::regconfig, coalesce(paragraph, ' '))
  from
  gutenberg.all_data a 
  inner join pg_ts_config b on lower(a.language) = b.cfgname 
  where gutenberg.paragraphs.num = a.num;
  create index textsearch_paragraph_idx on gutenberg.paragraphs using gin (textsearchable_index_col);
> if you order by ts_rank_cd then the query will sort all rows, including those that don't match. A better pattern is to take the ranking score as a column and then sort it in a subquery. [1] from stack overlfow has an example. (As an aside, from pg2, CTEs are no longer an optimization fence, so you can write the query out more cleanly with a CTE and still get the desired performance).

Ranking was already done as a column in a CTE and ranked further down:

  with paragraphs as (
                select 
                num
                , paragraph
                , ts_rank_cd(textsearchable_index_col, phraseto_tsquery(%s::regconfig, %s), 32) as rank 
                , ts_headline(%s, paragraph, phraseto_tsquery(%s::regconfig, %s), 'MaxFragments=1000, StartSel=**, StopSel=**') as highlighted_result 
                from gutenberg.paragraphs 
                where language = %s::regconfig and textsearchable_index_col @@ phraseto_tsquery(%s::regconfig, %s)
            ) 
I did rewrite the query as per your [1] but EXPLAIN ANALYZE found no difference.

Pagination is a tricky one. I was in a rush and hit an issue where building large Dash tables was very expensive and prone to crashing. I initially set up pagination as per the Dash examples, but that didn't play nice with markdown which I needed for the URLs and highlighting (giving these up for plain text tables made in HN-fast, but it confused my trial users). So the quick and dirty solution "to ship now" was to set up custom pagination in the SQL and show the results.

I think that if there is enough interest in this project, the front end will have to move off Dash.