Hacker News new | ask | show | jobs
by dobe 4087 days ago
I guess you are talking about the plan to have the data of one shard only on one disk (see https://github.com/elastic/elasticsearch/issues/9498)? This does not necessarily mean that you will end up having only one shard per datapath - only if you have just one shard per node. But you are right, the change might lead to unbalanced disk usage in some scenarios, where increasing the number of shards would solve the problem.

There are two options:

1. (Recommended for now) Export the table with COPY TO ( https://crate.io/docs/stable/sql/reference/copy_to.html). Drop the table and then import it again using COPY FROM (https://crate.io/docs/stable/sql/reference/copy_from.html ).

2. Use insert by query (see https://crate.io/docs/stable/sql/dml.html#inserting-data-by-... ) if it is ok for you to copy the whole data to another table (with more shards).

1) is recommended, since it allows for throttling on import time (see https://crate.io/docs/en/latest/best_practice/data_import.ht...) and also does not require a rename of a table, which is currently not implemented but is on our backlog. However i think once ES 2.0 is out we will have table renames and also throttling in insert by query, so option 2) will be recommended then.

Our genreal recommendation to the fixed number of shards limitation is to choose a higher number of shards upfront (number of expected cores matches the most use cases) or to use partitioned tables (https://crate.io/docs/en/latest/sql/partitioned_tables.html) where possible since those allow to change the number of shards for future partitions.