|
|
|
|
|
by mhneu
3358 days ago
|
|
Python's data infrastructure has a huge problem: serialization and thus saving data results. A good serialization library should serialize: - classes/objects (best practice: objects for holding data)
- pandas/numpy objects (must have: minimizing space)
- namedtuples (currently: a mess, factory implementation)
- dicts and lists of dicts (must have: space efficiency)
Compare to Matlab: save(f, 'anyobject'); anyobject=load(f)Python is terrible at this and it limits use in real data analysis environments and limits competition with matlab. |
|
If you want matlab files in Python you can use `scipy.io.loadmat('file.mat')`. PyTables (built on hdf5) is a better solution since the hdf5 format is a lot more flexible than matlab's (ime). But Parquet is looking to be the best solution moving forward as it's gaining a lot of mindshare as the go-to flexible format for data and will be / is used in Arrow.
But really, Matlab is on par with pickles when it comes to serialisation. It's a trap solution.