Hacker News new | ask | show | jobs
by PolarizedPoutin 796 days ago
Wish I saw this before I started haha! I left a footnote about why I didn't try binary copy (basically someone else found its performance disappointing) but it sounds like I should give it a try.

footnote: https://aliramadhan.me/2024/03/31/trillion-rows.html#fn:copy...

1 comments

Yeah I imagine it depends where the data is coming from and what exactly it looks like (num fields, dtypes...?). What I did was source data -> Numpy Structured Array [0] -> Postgres binary [1]. Bit of a pain getting it into the required shape, but if you follow the links the code should get you going (sorry no type hints!).

[0] https://rdrn.me/optimising-sampling/#round-10-off-the-deep-e... [1] In the original blog I linked.

I'd love to hear from anyone who's done the same in MySQL