|
|
|
|
|
by danpelota
2475 days ago
|
|
On occasion, I've fired up pandas just to sanitize a CSV file and drop malformed rows as preparation to bulk ingesting into a database: import pandas as pd
pd.read_csv('bad_file.csv', error_bad_lines=False).to_csv('good_file.csv')
It's not efficient (reads everything into memory), but read_csv is robust when it comes to handling embedded unescaped quotes/commas/etc., and supports dropping rows with the incorrect number of columns due to anomalies it can't handle. |
|