Hacker News new | ask | show | jobs
by tkyjonathan 2366 days ago
Not sure which DB you are using, but you can load the csv file into the DB directly on a single thread using something like LOAD DATA INFILE.

If you have some good indexes and do some push-down work (give the database aggregation tasks to do instead of your python code), you should probably be more than fine.

For a 250Gb file.. should be ok.. maybe add some partitioning too.

1 comments

I'm open to using any db that I can query over some engine with a python implementation. So any SQL db should be fine. However, I don't know how to convert a csv to an SQL directly. Is the command you mentioned part of some SQL server package? Sounds like it's exactly what I need.
pandas can read from a CSV file and then write to SQL. Even if you don't go the SQL route, you'd probably gain significant benefits by working with HDF instead of CSV.

https://pandas.pydata.org/pandas-docs/version/0.22/generated...

https://pandas.pydata.org/pandas-docs/version/0.22/io.html#w...