| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by csjh 115 days ago

Obligatory DuckDB solution:

> duckdb -s "COPY (SELECT url[20:] as url, date, count(*) as c FROM read_csv('data.csv', columns = { 'url': 'VARCHAR', 'date': 'DATE' }) GROUP BY url, date) TO 'output.json' (ARRAY)"

Takes about 8 seconds on my M1 Macbook. JSON not in the right format, but that wouldn't dominate the execution time.

1 comments

cess11 115 days ago

This log in one of the PR:s claims a 5.4s running time on some Mac.

https://github.com/tempestphp/100-million-row-challenge/pull...

link