Hacker News new | ask | show | jobs
by ngrilly 3552 days ago
Did you store the plain text of each PDF in PostgreSQL or just the ts_vector resulting from the plain text?
1 comments

IIRC, I stored the plain text too because the engine can return contextually marked up plaintext after finding it in the ts_vector.
You're right, PostgreSQL needs the plain text to highlight it with ts_headline. It's similar to Elasticsearch keeping the original document in the _source attribute. Thanks!