| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by aleph_minus_one 456 days ago

> The classic "I'll write my own csv parser - how hard can it be?"

I did as part of my work. It was easy.

To be very clear: the CSV files that are used are outputs from another tool, so they are much more "well-behaved" and "well-defined" (e.g. no escaping in particular for newlines; well-known separators; well-known encoding; ...) than many CSV files that you find on the internet.

On the other hand, some columns need a little bit of "special" handling (you could also do this as a post-processing step, but it is faster to be able to attach a handler to a column to do this handling directly during the parsing).

Under these circumstances (very well-behaved CSV files, but on the other hand wishing the capability to do some processing as part of the CSV reading), likely any existing library for parsing CSV would likely either be like taking a sledgehammer to crack a nut, or would have to be modified to suit the requirements.

So, writing a (very simple) own CSV reader implementation was the right choice.

1 comments

dkarl 455 days ago

> very well-behaved CSV files

You were incredibly lucky. I've never heard of anyone who insisted on integrating via CSV files who was also capable of consistently providing valid CSV files.

aleph_minus_one 455 days ago

> I've never heard of anyone who insisted on integrating via CSV files who was also capable of consistently providing valid CSV files.

To be fair: problematic CSV files do occur. But for the functionality that the program provides, it suffices if in such a situation, an error message is shown to the user that helps him track down where the problem with the CSV file is. Or if the reading does not fail, the user can see in the visualization of the read data where the error with the CSV file was.

In other words: what is not expected is that the program gracefully has to

- automatically find out the "intended behaviour" (column separators, encoding, escaping, ...) of the CSV parsing,

- automatically correct incorrect input files.