| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by bb01100100 447 days ago

Surely you’ve come across situations where line number 10,000,021 of a 60m line CSV fails to parse because there aren’t enough fields in that line of the file…? The issue is that you can’t definitively know which of the 50 fields is missing, so you have to fail the line or worse the file.

In my experience (perhaps more niche than yours since you mentioned it has been your day job), the lack of fall back options makes for brittle integrations. Failing entire files due to a borked row can be expensive in terms of time.

Having to ingest large CSV files from legacy systems has made me rethink the value of XML, lol. Types and schemas add complexity for sure, but you get options for dealing with variances in structure and content.

1 comments

Someone1234 447 days ago

That is a problem, but it is also a problem with XML. Parsing the XML file to discover e.g. unmatching tags is far more CPU and memory expensive than correctly passing a CVS.

In both cases you'd fail the entire file rather than partial recovery.

link