Hacker News new | ask | show | jobs
by whydoyoucare 2425 days ago
I am not clear what "manipulate" means here -- what is the author trying to do with the comma separated values? FWIW, I can accomplish csv manipulation using a handful of Unix utilities: sed, awk, cut, and call it a day.
3 comments

The text in a column can be quoted and inside the quotes there can be escaped quotes or commas.

It requires a lot of sed awk skills to merge two columns or delete a column. If at all possible

Yes, the "CSV road to hell":

- comma separated, nothing escaped (crash when 1 column contains a comma)

- comma separated, quotes around all elements, quotes not escaped

- comma separated, double-quotes around all elements, double-quotes not escaped

- comma separated, quotes around some elements, quotes not escaped

- comma separated, double-quotes around some elements, double-quotes not escaped

- comma separated, quotes around all elements, quotes escaped (using '')

- comma separated, double-quotes around all elements, double-quotes escaped (using "")

- comma separated, quotes around some elements, quotes escaped (using '')

- comma separated, double-quotes around some elements, double-quotes escaped (using "")

- comma separated, quotes around all elements, quotes escaped (using \')

- comma separated, double-quotes around all elements, double-quotes escaped (using \")

- comma separated, quotes around some elements, quotes escaped (using \')

- comma separated, double-quotes around some elements, double-quotes escaped (using \")

And the Microsoft format (where comma == semi-colon):

- semi-colon separated, nothing escaped (crash when 1 column contains a comma)

- semi-colon separated, quotes around all elements, quotes not escaped

- semi-colon separated, double-quotes around all elements, double-quotes not escaped

- semi-colon separated, quotes around some elements, quotes not escaped

- semi-colon separated, double-quotes around some elements, double-quotes not escaped

- semi-colon separated, quotes around all elements, quotes escaped (using '')

- semi-colon separated, double-quotes around all elements, double-quotes escaped (using "")

- semi-colon separated, quotes around some elements, quotes escaped (using '')

- semi-colon separated, double-quotes around some elements, double-quotes escaped (using "")

- semi-colon separated, quotes around all elements, quotes escaped (using \')

- semi-colon separated, double-quotes around all elements, double-quotes escaped (using \")

- semi-colon separated, quotes around some elements, quotes escaped (using \')

- semi-colon separated, double-quotes around some elements, double-quotes escaped (using \")

And I'm not talking about some weird custom CSV variants to support multi-lines for example or any other "I want to fit a circle in a square" mentality.

I don't know why people doesn't simply create TSV file (Tab-separated). No characters espacing mess. MUCHHH easier to parse.

EDIT: Formatting

This is the best answer if the csv file is more or less standard.
awk, sed, cut, etc... are excellent as long as the format is regular and you only need to process the file once