|
|
|
|
|
by zX41ZdbW
238 days ago
|
|
This and similar tasks can be solved efficiently with clickhouse-local [1]. Example: ch --input-format LineAsString --query "SELECT line, count() AS c GROUP BY line ORDER BY c DESC" < data.txt
I've tested it and it is faster than both sort and this Rust code: time LC_ALL=C sort data.txt | uniq -c | sort -rn > /dev/null
32 sec.
time hist data.txt > /dev/null
14 sec.
time ch --input-format LineAsString --query "SELECT line, count() AS c GROUP BY line ORDER BY c DESC" < data.txt > /dev/null
2.7 sec.
It is like a Swiss Army knife for data processing: it can solve various tasks, such as joining data from multiple files and data sources, processing various binary and text formats, converting between them, and accessing external databases.[1] https://clickhouse.com/docs/operations/utilities/clickhouse-... |
|