|
|
|
|
|
by csjh
115 days ago
|
|
Obligatory DuckDB solution: > duckdb -s "COPY (SELECT url[20:] as url, date, count(*) as c FROM read_csv('data.csv', columns = { 'url': 'VARCHAR', 'date': 'DATE' }) GROUP BY url, date) TO 'output.json' (ARRAY)" Takes about 8 seconds on my M1 Macbook. JSON not in the right format, but that wouldn't dominate the execution time. |
|
https://github.com/tempestphp/100-million-row-challenge/pull...