Hacker News new | ask | show | jobs
by danpelota 2475 days ago
On occasion, I've fired up pandas just to sanitize a CSV file and drop malformed rows as preparation to bulk ingesting into a database:

  import pandas as pd
  pd.read_csv('bad_file.csv', error_bad_lines=False).to_csv('good_file.csv')
It's not efficient (reads everything into memory), but read_csv is robust when it comes to handling embedded unescaped quotes/commas/etc., and supports dropping rows with the incorrect number of columns due to anomalies it can't handle.