Hacker News new | ask | show | jobs
by natdempk 625 days ago
Well, the problem you run into is that you kind of want different datastores for different use-cases. For example search vs. specific page loads, and you want to try and make both of those consistent, but you don't have a single DB that can serve both use-cases (often times primary DB + ElasticSearch for example). If you don't keep them consistent, you have user-facing bugs where a user can update a record but not search for it immediately, or if you try to load everything from ES to provide consistent views to a user, then updates can disappear on refresh. Or if you try to write to both SQL + ES in an API request, they can desync on failure writing to one or the other. The problem is even less the complexity of keeping the index up to date in realtime, and more that the ES index isn't even consistent with the primary DB, and to a user they are just different parts of your app that kinda seem a little broken in subtle ways inconsistently. It would be great to be able to have everything present a consistent view to users, that updates together on-write.
2 comments

The way I solved it once was trying to update ES synchronously and if it failed or timeouted - queue event to index the doc. Timeout wasn’t an issue, because double update wasn’t harmful.
In instances like that I tend to push back on the requirement, for example with this classic DB + Elasticsearch case:

1. How often is a user going to perform an update and then search for the exact same thing immediately after?

2. Suppose they did: if elasticsearch was updated in the background, is the queue/worker running fast enough such that the user won't even notice a latency of a second or two max?

It really depends on what you're doing, because if Elasticsearch is operating as its own source of truth with data that the primary DB doesn't have, then yeah, you're going to have trouble keeping both strongly consistent in a transactional manner without layering on complexity (like sagas with transactions and compensations). But if it's merely a search engine on top of your source of truth (for example, you search ES to get a list of primary keys and then fetch all the data from the DB), you've got some breathing room.

I mean, we're talking plucky upstart here and not enterprise FAANG, so there's definitely a case for 'less is more'.