|
|
|
|
|
by henrydark
1572 days ago
|
|
Splink over duckdb is the bomb. My duckdb wrapper I sent you in the github issue a few weeks ago linked a pair of five million record datasets in about twenty minutes. Spark took about the three hours to do the same job with an infinite resources cluster. |
|