| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ewheeler 2806 days ago

For those looking for a modern take on improving on the CSV format, I'd recommend Frictionless Data's Datapackage specification[1] which basically consists of a json file of metadata that accompanies a CSV file that describes column types, versions, sources, and how to validate correctness of the CSV's data. This allows for quite a lot of tooling and workflow improvements to CSV files without mucking with the CSV itself

Another hack to improve CSV workflows is OCHA's HXL[2] that is used by humanitarian organizations. Basically adding a row of hashtags in addition to column names, which is surprisingly useful considering the ease of adding these to a file.

[1] https://frictionlessdata.io/docs/tabular-data-package/ [2] http://hxlstandard.org/

2 comments

geraldbauer 2805 days ago

FYI: The (tabular) datapackage is great / fantastic but one layer up the stack.

See https://github.com/csv11/csvpack for real-world data package examples.

By the way, the tabular data csv dialect specification - is a great start/initiative (mostly a 1:1 copy from the python parser :-), really - would need an update, for more options, to reflect the reality of the csv formats out there. The big insight and breakthrough - csv is NOT one spec or format - but various flavors / formats / dialect - let the computer (that is, csvreader library) handle it.

link

_pwalsh 2805 days ago

One of the authors of the Frictionless Data specifications here. The spec directly relevant to CSV is Table Schema [1], and we’ve also got some nice tools that leverage Table Schema and the family of specifications, such as goodtables [2].

[1] https://frictionlessdata.io/specs/table-schema/ [2] https://github.com/frictionlessdata/goodtables-py

link

robochat42 2805 days ago

This looks interesting and useful. It might be a feature that wouldn't get used a lot but I've always felt that it would be great to have a 'unit' field to specify the physical units of the values. Have you ever discussed adding this as an optional field to the specification?

link

_pwalsh 2805 days ago

We’ve looked into it:

https://github.com/frictionlessdata/specs/issues/537

Feel free to add your use cases to that issue.

link