Hacker News new | ask | show | jobs
by some_developer 1781 days ago
Anectodal note:

A few years ago we added yet-another part to our product and, whilst ES worked "okay", we got a bit weary of ES due to "some issues" (some bug in the architecture keeping things not perfect in sync, certain queries with "joins" of types taking long, demand on HW due to the size of database, no proper multi-node setup due to $$$ and time constraint, etc.; small things piling up over time).

Bright idea: let's see how far Postgres, which is our primary datastore, can take us!

Unfortunately, the feature never made it fully into production.

We thought that on paper, the basic requirements were ideal:

- although the table has multiple hundreds of millions of entries, natural segmentation by customer IDs made possible individual results much smaller

- no weighted search result needed: datetime based is perfect enough for this use-case, we thought it would be easy to come up with the "perfect index [tm]"

Alas, we didn't even get that far:

- we identified ("only") 2 columns necessary for the search => "yay, easy"

- one of those columns was multi-language; though we didn't have specific requirements and did not have to deal with language specific behaviour in ES, we had to decide on one for the TS vectorization (details elude me why "simple" wasn't appropriate for this one column; it was certainly for the other one)

- unsure which one, or both, we would need, for one of the columns we created both indices (difference being the "language")

- we started out with a GIN index (see https://www.postgresql.org/docs/9.6/textsearch-indexes.html )

- creating a single index took > 15 hours

But once the second index was done, and had not even rolled out the feature in the app itself (which at this point was still an ever changing MVP), unrelated we suddenly got hit by lot of customer complains that totally different operations on this table (INSERTs and UPDATEs) started to be getting slow (like 5-15 seconds slow, something which usually takes tiny ms).

Backend developer eyes were wide open O_O

But since we knew that second index just finished, after checking the Posgres logs we decided to drop the FTS indices and, lo' and behold, "performance problem solved".

Communication lines were very short back then (still are today, actually) and it was promptly decided we just cut the search functionality from this new part of the product and be done with it. This also solved the problem, basically (guess there's some "business lesson" to be learned here too, not just technical ones).

Since no one within the company counter argued this decision, we did not spend more time analyzing the details of the performance issue though I would have loved to dig into this and get an expert on board to dissect this.

--

A year later or so I had a bit free time and analyzed one annoying recurring slow UPDATE query problem on a completely different table, but also involving FTS on a single column there also using a GIN index. That's when I stumble over https://www.postgresql.org/docs/9.6/gin-implementation.html

> Updating a GIN index tends to be slow because of the intrinsic nature of inverted indexes: inserting or updating one heap row can cause many inserts into the index (one for each key extracted from the indexed item). As of PostgreSQL 8.4, GIN is capable of postponing much of this work by inserting new tuples into a temporary, unsorted list of pending entries. When the table is vacuumed or autoanalyzed, or when gin_clean_pending_list function is called, or if the pending list becomes larger than gin_pending_list_limit, the entries are moved to the main GIN data structure using the same bulk insert techniques used during initial index creation. This greatly improves GIN index update speed, even counting the additional vacuum overhead. Moreover the overhead work can be done by a background process instead of in foreground query processing.

In this particular case I was able to solve the occasional slow UPDATE queries with "FASTUPDATE=OFF" on that table and, thinking back about the other issue, it might have solved or minimized the impact.

Back to the original story: yep, this one table can have "peaks" of inserts but it's far from "facebook scale" or whatever, basically 1.5k inserts / second were the absolute rare peak I measured and usually it's in the <500 area. But I guess it was enough for this scenario to add latency within the database.

--

Turning back my memory further, I was always "pro" trying to minimize / get rid of ES after learning about http://rachbelaid.com/postgres-full-text-search-is-good-enou... even before we used any FTS feature. At also mentions the GIN/GiST issue but alas, in our case: ElasticSearch is good enough and, besides the thwarts we've with it, actually easier to reason about (so far).