Hacker News new | ask | show | jobs
by _pwalsh 3185 days ago
Hi,

(I work on the Frictionless Data specifications and tooling at Open Knowledge International.)

CSV has many, many warts. However, it is the best thing we have right now for serialising data in a way that is easily read by humans (and consumer-grade software) and machines. Libraries like our Tabulator [1] which is used under-the-hood help provide an API to deal with many of the gotcha's when dealing with the format.

[1]: https://github.com/frictionlessdata/tabulator-py

1 comments

Thanks, will have a look at tabulator. I appreciate a list of validators you published on FD site, can help at least a bit when working with non-techies to vet their data before submission.
Not on the list is ETLyte (https://sorrell.github.io/etlyte/), which I built so that an insurance company's corporate clients could vet their flatfiles before submitting them - it has worked very well across multiple files, with custom validations, and is very speedy (uses SQLite). As far as "non-techies" go, it's pretty straightforward, but confined to the command line, so I guess I need to get working on a web frontend for this :)