Hacker News new | ask | show | jobs
by shinypenguin 234 days ago
Is the dataset somewhere accessible? Does anyone know more about the "1T challenge", or is it just the 1B challenge moved up a notch?

Would be interesting to see if it would be possible to handle such data on one node, since the servers they are using are quite beefy.

2 comments

Hi shinypenguin - the dataset and challenge are detailed here: https://github.com/coiled/1trc

The data is in a publicly accessible bucket, but the requester is responsible for any egress fees...

I suggest linking to that from the article, it is a useful clarification.
Good point - I'll update it...
Hi, thank you for the link and quick response! :)

Do you know if anyone attempted to run this on the least amount of hardware possible with reasonable processing times?

Yes - I also had GizmoSQL (a single-node DuckDB database engine) take the challenge - with very good performance (2 minutes for $0.10 in cloud compute cost): https://gizmodata.com/blog/gizmosql-one-trillion-row-challen...
The One Trillion Row Challenge was proposed by Coiled in 2024. https://docs.coiled.io/blog/1trc.html