Hacker News new | ask | show | jobs
by hu3 822 days ago
Related, clickhouse local cli command is a speed demon to parse and query JSON and other formats such as CSV:

- "The world’s fastest tool for querying JSON files" https://clickhouse.com/blog/worlds-fastest-json-querying-too...

- "Show HN: ClickHouse-local – a small tool for serverless data analytics" https://news.ycombinator.com/item?id=34265206

1 comments

I'll second this. Clickhouse is amazing. I was actually using it today to query some CSV files. I had to refresh my memory on the syntax so if anyone is interested:

  clickhouse local -q "SELECT foo, sum(bar) FROM file('foobar.csv', CSV) GROUP BY foo FORMAT Pretty"
Way easier than opening in Excel and creating a pivot table which was my previous workflow.

Here's a list of the different input and output formats that it supports.

https://clickhouse.com/docs/en/interfaces/formats

You don't even need to use file() for a lot of things recently. These just work with clickhouse local. Even wildcards work.

  select * from `foobar.csv`
or

  select * from `monthly-report-*.csv`
Just had to try:

  $ function _select_aux () { clickhouse local -q "SELECT $* FORMAT Pretty" }
  $ alias SELECT='noglob _select_aux'
  $ SELECT COUNT(*) as count FROM file('repos.json', JSON)
  ┏━━━━━━━┓
  ┃ count ┃
  ┡━━━━━━━┩
  │    30 │
  └───────┘
This is very nice!! I went down the format clause[1] rabbit hole a couple minutes ago and it looks like they have `FORMAT Markdown` as an option. FORMAT PrettySpaceNoEscapes also looks a bit better than Pretty or the default PrettyCompact imho

1. https://clickhouse.com/docs/en/interfaces/formats

Where's this noglob command coming from? I haven't heard of it before.
Ah sorry that is probably zsh specific, it makes sure to not glob expand * etc
Ooh very nice, thanks for the tip!