| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by _wmd 2733 days ago

As much as I love SQLite, and while it is open source, it is a single implementation that AFAIK has no published open specification. The only way to read an SQLite file is using SQLite, and in that respect, it is for many users just as closed as wrapping something in a word document.

CSV isn't perfect, but it provides a ton of flexibility, for example, CSVs can be streamed or support parallel segmented download across a network with useful work possible during the transfer. The format is so simple that it can approach almost free to parse (see e.g. my own https://github.com/dw/csvmonkey ).

CSV is also distinguished in that regular home users with spreadsheet programs can usually do most things a developer can do with the same file. For me user empowerment trumps all other goals in software, including warts. Things like JSON, XML or SQLite definitely don't fit in that category, although I guess SQLite is at least better due to the wide availability of decent GUIs for it.

Finally as a data transfer format, SQLite has the potential to be massively inefficient. Done incorrectly it can ship useless indexes that can inflate size >100%, and even in the absence of those, depending on how amenable the data is to being stored in a btree and the access patterns used to insert it, can leave tons of wasted space inside the file, or AFAIK even chunks of previously deleted data.

5 comments

darkpuma 2733 days ago

>"As much as I love SQLite, and while it is open source, it is a single implementation that AFAIK has no published open specification. The only way to read an SQLite file is using SQLite, and in that respect, it is for many users just as closed as wrapping something in a word document."

That's an extreme position to take, particularly since the SQLite code is public domain. Furthermore it's one of the formats recommended by the Library of Congress for archival/data preservation:

https://www.loc.gov/preservation/resources/rfs/data.html

https://www.sqlite.org/locrsf.html

_wmd 2733 days ago

> The only way to read an SQLite file is using SQLite

This part unfortunately isn't a position, it's absolute. It's hard to imagine a situation where as a developer we would not have access to a C runtime or for any reason whatsoever would not be able to use SQLite, but the hard dependency on its code is real, and represents a real hazard in the wrong environment. A super easy example would be parsing data on say, a tiny microcontroller on an IOT device. This can start to hurt quickly:

> Compiling with GCC and -Os results in a binary that is slightly less than 500KB in size

Open formats at least give you the option of implementing whatever minimal hack is necessary to finish your job without say, introducing some intermediary to do an upfront conversion, and at least for this reason SQLite cannot really be considered a perfectly universal format

lultimouomo 2733 days ago

>> The only way to read an SQLite file is using SQLite

> This part unfortunately isn't a position, it's absolute.

It's also false. I know of SQLJet which is a pure java implementation, there may be others. But in the end, the SQLite format being well defined and documented [1], a sure-fire way to read an SQLite file is writing the code to read an SQLite file. Since SQLite a rock-solid, public domain, portable C library, it might not be the best idea to do that, but it is completely feasible. No one stops you from "implementing whatever minimal hack is necessary to finish your job" while using the SQLite format.

[0] https://sqljet.com

[1] https://www.sqlite.org/fileformat.html

ufmace 2733 days ago

> A super easy example would be parsing data on say, a tiny microcontroller on an IOT device.

I mean, it's a point, but I don't think anybody is saying that SQLite should replace all data storage formats everywhere. If you're just storing a few dozen short text strings with keys, plain text is fine. I don't think you'd want to have a JSON parser on a tiny microcontroller either.

darkpuma 2733 days ago

By 'position' I was very obviously referring to

>"as closed as wrapping something in a word document."

Sure CVS makes it trivial to waste your time reinventing the wheel, making your own parser. The situations were there are technical limitations that prevent the use of sqlite are becoming vanishingly rare. (Not to mention the resources necessary to use sqlite is unrelated to how many implementations there are or whether it's 'open' or 'closed'.)

Skunkleton 2733 days ago

1) For practical purposes the vast majority of platforms that are inappropriate for SQLite are not appropriate for CSV.

2) SQLite has no standard. The same is true for CSV in practice. At least SQLite has a high quality reference implementation.

3) It’s a shame that SQLite doesn’t have a standard of some sort.

kccqzy 2732 days ago

It's not extreme. It's the whole reason why WebSQL failed.

darkpuma 2732 days ago

Do you have a source for that claim? Mozilla's 2010 justification for dropping WebSQL was that SQLite doesn't implement a particular standard of SQL (incidentally no SQL implementation does, SQL is notorious for that, every vendor implements something slightly different with parts of the various standards excluded and other features included.)

But more to the point, Mozilla's gripe was with SQLite's API, not with the file format used by SQLite. I can't find any source for the file format used by SQLite being the problem that tripped up WebSQL.

tyingq 2733 days ago

"AFAIK has no published open specification. The only way to read an SQLite file is using SQLite"

It's documented pretty well: https://www.sqlite.org/fileformat.html

I suppose that doesn't make it an open standard, but it hasn't changed much.

sjwright 2733 days ago

Not sure what an open standard gets you above what SQLite already is. You can’t get much more open than that.

tyingq 2733 days ago

That's basically the non-subtle version of what I was trying to convey :)

_wmd 2733 days ago

I had never seen this before! Thanks

nabla9 2733 days ago

You argue bases on a good principle. But while the principle of open well documented standard is good one, it's just a principle. The goal is portability, consistence and long term stability.

SQLite is a Recommended Storage Format for datasets according to the US Library of Congress. https://www.loc.gov/preservation/digital/formats/fdd/fdd0004...

Nydhal 2733 days ago

You can open a SQLite file as text and read it and be able to discern data and tables.

Not saying it's useful but you saying "only way to read SQLite is SQLite" is hyperbole.

simonh 2733 days ago

>...it is for many users just as closed as wrapping something in a word document.

Which users? I’m curious.

Cross platform, fully documented, Fred software, multiple implementations vs closed proprietary single platform commercial software. Really?