|
|
|
|
|
by ewheeler
2806 days ago
|
|
For those looking for a modern take on improving on the CSV format, I'd recommend Frictionless Data's Datapackage specification[1] which basically consists of a json file of metadata that accompanies a CSV file that describes column types, versions, sources, and how to validate correctness of the CSV's data. This allows for quite a lot of tooling and workflow improvements to CSV files without mucking with the CSV itself Another hack to improve CSV workflows is OCHA's HXL[2] that is used by humanitarian organizations. Basically adding a row of hashtags in addition to column names, which is surprisingly useful considering the ease of adding these to a file. [1] https://frictionlessdata.io/docs/tabular-data-package/
[2] http://hxlstandard.org/ |
|
See https://github.com/csv11/csvpack for real-world data package examples.
By the way, the tabular data csv dialect specification - is a great start/initiative (mostly a 1:1 copy from the python parser :-), really - would need an update, for more options, to reflect the reality of the csv formats out there. The big insight and breakthrough - csv is NOT one spec or format - but various flavors / formats / dialect - let the computer (that is, csvreader library) handle it.