Hacker News new | ask | show | jobs
by jhgg 3380 days ago
The first time you run a search in a server (or the first time you run a search in a server after the index fails) - will trigger a full re-index of that server. Ctrl-F "Historical Index" in the blog post for more details! If you've never used search in a server - the messages are not indexed in real time until you do for the first time. Both these things make the system "lazy".

The worst case to an index failure is that the search query is delayed as the index rebuilds itself. We throttle the rate of historical indexing into ES to a safe level so that we're not degrading performance of other components of the system.

1 comments

Oh, I think I get it now-- is it that the _initial_ indexing is lazy, but all indexing after that is done automatically by the historical index workers? Basically when a user searches for something do you check that ES has something for that user, if it doesn't start off the initial indexing process, and from there the workers do their thing?
The historical index workers index the history of the server, whereas we have a real-time index worker that is consuming and bulk inserting messages in real time. Searching for the first time in a server turns on real-time indexing and triggers a historical index of all previous messages in that server.
Can you talk about the bulk insert for real time messages ? How does this get triggered - does it run every X minutes or X messages (your code looks like it is every X messages).

Are you using DB triggers to fire the job ?

Got it. I was under the impression that all messages were lazily indexed. After reading the article again it's pretty clear that's not the case. Thanks for the clarification.