Hacker News new | ask | show | jobs
by philippemnoel 984 days ago
One of the ParadeDB authors here, hey! Thanks for pointing this out, you're completely right. That's an oversight on our end. We'll update the benchmarks and re-run them to correct this :)
2 comments

Great to hear, a benchmark against trigram searching with gin index would also be great. There are multiple ways to do full text search with postgres and they’re all insanely fast and memory efficient. Benchmarking various methods for comparison would be helpful.

https://www.crunchydata.com/blog/postgres-full-text-search-a...

Thanks for sharing, will look to add a benchmark for that as well
I learned the hard way that Gin updates are too slow, and in my case it was not even 100 updates per seconds on average, but could peak to 1000.

How does Pg_bm25 compare here with maintaining the index & performance?

If I am understanding your experience correctly the colloquial wisdom here is to use GIN on static data and GIST on dynamic data.

> In choosing which index type to use, GiST or GIN, consider these performance differences:

> GIN index lookups are about three times faster than GiST

> GIN indexes take about three times longer to build than GiST

> GIN indexes are moderately slower to update than GiST indexes, but about 10 times slower if fast-update support was disabled (see Section 54.3.1 for details)

> GIN indexes are two-to-three times larger than GiST indexes

> As a rule of thumb, GIN indexes are best for static data because lookups are faster. For dynamic data, GiST indexes are faster to update. Specifically, GiST indexes are very good for dynamic data and fast if the number of unique words (lexemes) is under 100,000, while GIN indexes will handle 100,000+ lexemes better but are slower to update.

https://www.postgresql.org/docs/9.1/textsearch-indexes.html