Hacker News new | ask | show | jobs
by wkoszek 982 days ago
Hey guys. Congratulations - this is an exciting development. Can you show some benchmarks around showing the count of matches -- `select count() from table where text match is there`?

This was the top reason that made us (Segmed.ai) give up on PostgreSQL FTS -- our folks require a very exact count of matches for medical conditions that are present in 20M reports. And doing COUNT() in PostgreSQL was crazy, crazy slow. If your extension could do simple len(invertedindex[word]) that would already be a great improvement.

ELK has it immediately, but at a cost of being one more thing to maintain, and the whole Logstash thing is clunky. I'd love to use FTS inside of PostgreSQL.

2 comments

I’m not sure if Postgres could support that type of operation directly via count() since I don’t know if the fact that no other filters are present is available to the Index Access Method API.

It might be possible to do a separate function though, like:

select pg_bm25_direct_count(‘term’)*

If you do that, I can update postgres-searchbox [1] to use it for better frontend experience.

[1] https://www.npmjs.com/package/postgres-searchbox

That would be fine--basically any way of achieving it would be fine. As of now, in PostgreSQL's FTS, I don't think there's any way to do this fast enough to give it back to the user.
Thanks!

We released support for metrics aggregations a few days ago, including count: https://docs.paradedb.com/aggregations/metrics#count.

We haven't gotten around to benchmarking aggregations - that's the focus for next week and we'll publish them once they're done. I would suspect that it's a lot faster than Postgres aggregates since it leverages Tantivy Columnar.

Nice! I would be very interested by your benchmark, don't hesitate to jump in the quickwit discord server to talk about the results. https://discord.quickwit.io/