| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by simondotau 1076 days ago

I've done exactly this for about a decade and it has worked supremely well. Robust and resilient because it's simple and idempotent.

In my case I'm using Solr and my last_indexed field isn't written to until the Solr index call completes without error. I have a very basic lock on the indexing process which hasn't failed me yet, and if it ever did fail the consequences would only be wasted CPU cycles. I consider that a lower risk than updating last_indexed only to have the actual indexing fail unexpectedly.

In the rare instances I've needed to re-index from scratch the process has been incredibly simple:

1. Start a new instance of Solr on a powerful AWS instance and direct index updates to it

2. Set all last_indexed fields to NULL

3. Wait for the scheduled task to complete the re-indexing

4. Reboot the new Solr instance on a sufficient AWS instance

5. Shift to the new Solr instance for search engine reads