Hacker News new | ask | show | jobs
by pleasecalllater 3057 days ago
Well, I just wanted to use pandas to load a 4GB csv file. After using 32GB of my RAM, and 4GB of swap I gave up. I've just loaded all that data to Postgres, and made a couple of queries. This way I stopped using pandas at all.
2 comments

I found that pandas is great for data exploration and data that you know is small (few 100s MB). Other than that, Python builtins and numpy arrays are a better alternative.
I hardly use pandas at this point besides read_csv, which is very good once you know the syntax for parsing strings/dates, skipping rows, dropping columns, etc.

After that I usually just keep the numpy array since all I need is floats. I guess the index groupby stuff is cool, but I never really needed it. Postgres is fine but if you're just doing numerics it doesn't help much.

It helps with having smaller RAM requirement. And I have the group by, and materilized indices, which helps a lot to preserve huge modified datasets.