Hacker News new | ask | show | jobs
by mschuster91 617 days ago
SQL interop is where the pain is at. Using standard tooling of most database systems, best you can get is CSV with all the pains this shithole of a data transfer format brings.
2 comments

That doesn't match my experience at all. Every database I'm aware of has multiple options for data export and import for batch and command processing in the standard tooling. CSV is almost never "the best you can get". What is going to be best depends on your use case, and indeed at a bare minimum you can almost always change the field and line seperators and get something that works just fine on the commandline and avoids most of the stuff people find hard about CSV.

Additionally, if people like jq, they don't need a special tool like this. They can just get the database to output json and use json/jq - another tool isn't needed. In postgres you can do something like this

   select array_to_json(array_agg(row_to_json (r))) from (
      ... put your sql query here...
   ) r;
...and postgres will output a json array for you where each item in the array is a json map of a single row of the output.

I'm sure other databases have similar functionality.

What's wrong with CSV? I love me a CSV file.
Having the control characters mixed in with the data, and having no defined encoding.
Out of the random shit I had to deal with CSV file wrangling in the last two years:

- no defined encoding, so it may be anything from US-ASCII over ISO-8859-x to UTF-8 and it's always a guesswork what it actually is

- no definition of the separator sign, usually it can be a literal comma, a semicolon, or tabs

- escaping of the separator sign in column values is always fun

- escaping of newlines is even MORE fun

- line endings are not specified

Every piece of software there is has their own expectations on what specific kind of format it expects, and to make it worse, using "Microsoft Excel" as a target/source isn't foolproof either because it behaves differently between the Mac and the Windows version!

JSON in contrast is clearly defined in all of these issues and has battle-tested implementations in every programming language there is.

Most of these issues while real don't actually arise in this case, because we're not trying to ETL some random file, we are the ones talking to the database so we get to choose exactly how the data gets formatted on extract.

For example, here's your list fully handled in postgres: 1. SET CLIENT_ENCODING TO 'value'; (eg 'UTF8') 2. COPY ... with FORMAT CSV DELIMITER 'delimiter_character' QUOTE 'quote_character'

Now the output format is fully specified and everything just works fine (including for input into excel)

   The values in each record are separated by the DELIMITER character. If the value contains the delimiter character, the QUOTE character, the NULL string, a carriage return, or line feed character, then the whole value is prefixed and suffixed by the QUOTE character, and any occurrence within the value of a QUOTE character or the ESCAPE character is preceded by the escape character.
https://www.postgresql.org/docs/current/sql-copy.html
Yeah, the problem is not every application can read such files. Particularly when newlines are involved, it's a hit and miss.
Well not every application can read any file and either way "sq" isn't going going to make any difference. Like I said they are real issues but don't arise in this case.