Hacker News new | ask | show | jobs
by outworlder 2436 days ago
This part left me scratching my head:

> We have set “replica 0” in our indexes settings

> Now let’s assume that node 3 goes down:

> As expected, all shards from node 3 are moved to node 1 and node 2

No, as there are no shards that can be moved, as number of replicas was set to zero and one node went down. Not sure what they are trying to explain here.

> In order to resolve this issue, we introduced a job which runs each day in order to update the mapping template and create the index for the day of tomorrow, with the right number of shards according to the number of hits our customer received the previous day.

This is a very common use-case(eg. logging), but it's surprising that Elastic has nothing to automate this.

3 comments

> This is a very common use-case(eg. logging), but it's surprising that Elastic has nothing to automate this.

You can set an index template to be used on new indices that match a pattern, which is a very common thing to do. It sounds like what they did was modify the template daily, which is less common IME. It's not clear why they had to manually create the index, though. That should happen automatically.

> You can set an index template to be used on new indices that match a pattern, which is a very common thing to do

It is, but how can you tell in your template you want to keep shard sizes under 50GB? You can't.

The best thing you can do (as they did) is, based on historical data, update the template, so that the new index will have shards that (hopefully) are under 50GB.

Indices are composed of one or more primary shards. Each primary shard can have one replica. Three nodes, each with one primary shard as a part of that sjngle index, no replicas in play at all.
> Indices are composed of one or more primary shards. Each primary shard can have one replica. Three nodes, each with one primary shard as a part of that sjngle index, no replicas in play at all.

Ok, 3 nodes, each with one primary shard. No replicas. 1 node goes down, one shard is no longer found in the cluster, because it was in the missing node. That particular index, and in fact the whole cluster, are now RED.

Unless you discard that shard (force reroute, with accept_data_loss), nothing is going to be recovered and the missing shards will not be allocated anywhere.

Oh, yes, sorry i see your perspective now. You will get data loss in this example. My understanding of the example was that it is showing how one node can end up with all the write operations, i wasnt under the impression that it was a "real" cluster.
author here: > replica 0 It's an example for the article and the intention was to remove the complexity of primary/replica shards. Let's say "shard" is an unit and no matter about primary or replica. In fact with replica to 1, the behaviour would be the same but in the diagram it will have twice more shards. What we wanted to show is IF one node goes down and up after few times AND a rollover occurs just after then this node will handle all those new shards and so handle all the write. here we have a spread write issue

> introduction of a job which runs each day this job has many purposes: 1) because of rollover, our indexes are now suffixed by -000001 then -000002 etc...our applications no matter of the rollover post and get doc by the alias in front of these indexes suffixed by -000001... So if you don't create the index for the next day in a daily basis index design, your application will push at 00:00:01 a new index with the alias name and it won't be in rollover "mode". 2) because we are using ILM feature, we need to define the ILM rollover alias in the template and it changes each day because our indexes name are "index_$date" 3) performance issues: we have a lot of traffic and if we do not create index before for the next day, we will except a lot of unassigned shards, cluster yellow etc...

in fact, yes, it's a common use-case (daily based index) but maybe not with rollover + ilm

> replica 0 It's an example for the article and the intention was to remove the complexity of primary/replica shards.

Ah, got it. So maybe it would be best said as "for the following example, ignore any replicas".

> in fact, yes, it's a common use-case (daily based index)

But it is not automated by the Elastic folks. Do you have any intentions of open-sourcing a portion of this job?