|
I had used PostgreSQL for a decade, including full-text search, but just within apps that were already storing their data in Postgres. The time came to replace our website search (tens of thousands of pages), and we decided to try rolling our own. Someone suggested ElasticSearch, and as I read through it, it seemed to do less than PostgreSQL. I still had the hard problems of (1) spidering the site and (2) converting all the file formats (.doc, .xls, .pdf, etc.). I ended up just putting wget on a daily cron job to spider the site. Then I ran the saved files through a hodgepodge of scripts to extract the plain text and put it into PostgreSQL. Once it's there, it's far easier to do the rest. Postgres has its own functions to search for matches, rank the matches, give you snippets, and even highlight the search words in the snippets. It's amazing. Searches run in a split second. Well, at first, when I was testing, they often took a few seconds. But the weird thing is that after go-live it ran faster. My best guess is that so many users caused Postgres to cache more and more of itself into RAM. The whole server is still using less than 1 GB though, and it's running Apache and Postgres for the website and all its apps. |