| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by zx8080 855 days ago
	Elapsed: 12.618 sec, read 7.13 billion rows, 42.77 GB This is too long, seems the ORDER BY is not set up correctly for the table.

2 comments

zx8080 855 days ago

Also,

> `repo_name` LowCardinality(String),

This is not a low cardinality:

7133122498 = 7.1B

Don't use low cardinality for such columns!

link

zX41ZdbW 854 days ago

The LowCardinality data type does not require the whole set of values to have a low cardinality. It benefits when the values have locally low cardinality. For example, if the number of unique values in `repo_name` is a hundred million, but for every million consecutive values, there are only ten thousand unique, it will give a great speed-up.

link

zx8080 850 days ago

> LowCardinality data type does not require the whole set of values to have a low cardinality.

Don't mislead others. It's not true unless low_cardinality_max_dictionary_size is set to some other value than the default one: 8192.

It does not work well for hundred million values.

link

zX41ZdbW 854 days ago

This is an ad-hoc query. It does a full scan, processing slightly less than a billion rows per second on a single machine, and finishes in a reasonable time with over 7 billion events on GitHub from 2015. While it does not make sense to optimize this table for my particular query, the fact that it works well for arbitrary queries is worth noting.

link