|
|
|
|
|
by peatmoss
4192 days ago
|
|
> large datasets on disk I saw this benchmark a while back comparing Pandas to SQLite in-memory databases. While Pandas did edge out SQLite in several areas, it was by well under an order of magnitude: http://wesmckinney.com/blog/?p=414 Pretty solid performance plus the ability to work with large datasets on disk seemed like a pretty big win to me. I could imagine a set of SQLite extensions (a la spatialite) that could further optimize for various data.frame use cases. As an added bonus, the same libraries would be very portable between different languages--even languages that don't currently have something like dataframes. EDIT: What I don't know about is memory efficiency. Perhaps SQLite isn't, but I'd not bet against? |
|
There were two reasons for the switch. SQL syntax is cleaner and more well understood by others. The second is if you get a dataset bigger than memory, you aren't stuck.