Hacker News new | ask | show | jobs
by henrydark 1572 days ago
Splink over duckdb is the bomb.

My duckdb wrapper I sent you in the github issue a few weeks ago linked a pair of five million record datasets in about twenty minutes. Spark took about the three hours to do the same job with an infinite resources cluster.