Hacker News new | ask | show | jobs
by zx8080 855 days ago
Also,

> `repo_name` LowCardinality(String),

This is not a low cardinality:

7133122498 = 7.1B

Don't use low cardinality for such columns!

1 comments

The LowCardinality data type does not require the whole set of values to have a low cardinality. It benefits when the values have locally low cardinality. For example, if the number of unique values in `repo_name` is a hundred million, but for every million consecutive values, there are only ten thousand unique, it will give a great speed-up.
> LowCardinality data type does not require the whole set of values to have a low cardinality.

Don't mislead others. It's not true unless low_cardinality_max_dictionary_size is set to some other value than the default one: 8192.

It does not work well for hundred million values.