Hacker News new | ask | show | jobs
by TheZenPsycho 4320 days ago
Do you think in theory that it is possible to create a flat file based system that is multi-thread/multiuser safe?

My somewhat ignorant inclination is to suggest that maildir fits this description.

2 comments

Certainly. SQLite is a great example of one. It handles the tradeoff elegantly: reads can be done in parallel but writes have to be sequential.

I don't believe maildir would be described as a flat file.

Well I suppose then maybe I have the wrong idea of what "flat file database" means. To you anyway. I would not actually describe sqlite as a flat file database system... Since the data cannot be read safely directly from the file, using a text editor, or -- since a flat file could be binary data, any arbitrary program that just knows the format. And "flat" implies there is no special data structure involved. I would think that would not be the case for sqlite given how well it performs, it must use some kind of binary tree or clever indexing system of some sort.

You must go via the actual sqlite process.

On the other hand maildir, composed of multiple ascii compatible text files, can be read by any email client- provided it follows the protocol for reading and writing of files in the maildir spec.

The term "flat file database" conventionally refers to a database that is serialized to a single file, whether it is plain text or a binary. The emphasis here is on the single file aspect: it's really easy to transfer the entire database between machines.
Another meaning of "flat file database" is non-relational -- the kind of thing you might create if you were unaware of normal form and tried to cram the entire data model into a single table. This single table can reside in a single flat file. The file is flat as in not related to any other files. Here is an example "flat file" data model:

emp_id emp_fname emp_lname emp_phone1 emp_phone1_type emp_phone2 emp_phone1_type dept_id dept_manager dept_manager_phone1 dept_manager_phone1_type

This approach has all kinds of problems (solved 40+ years ago by Codd and others). Examples: how do you add a third phone for an employee. How do you keep the managers phone from getting out of sync across rows.

I took "flat JSON file database" to mean essentially the same thing. Something like this:

emp_id emp_JSON_object

JSON perhaps solves some issues (e.g. adding a third phone for an employee), but still suffers from most of the issues (e.g. keeping the managers phone from getting out of sync across employees). Plus it suffers from new issues, namely the fact that each JSON object could have its own layout (e.g. one of them could have a third phone) so there is a bunch of parsing overhead compared to say knowing that phone number is at a specific offset on every row.

MyISAM serializes each table to a single file on disk (well, three files, one each for the definition, data and indexes).

Would you consider MySQL to be a flat file database?

I wouldn't, but I'm not really an expert here. If every table is three files, that implies a non-trivial database would be dozens of files.

I suppose the other use of "flat file" I've heard is a flat directory with no sub-directories, but I've never heard that applied to a "flat file database".

>>And "flat" implies there is no special data structure involved.

Flat files doesn't mean neither text file, nor human-readable. Text format, human-readable format is a terrible one to store data, esp. numbers.

>> since a flat file could be binary data...
Should be noticed, the semantics of SQLite behavior in multithreaded mode is pretty cryptic, even to its long-term users (and others just silently give up, and hand-roll coarse-grained locks, I suspect):

http://sqlite.1065341.n5.nabble.com/think-I-need-better-erro...

The linked discussion has one of the most succinct comments about error handling I've seen.

You handle the result codes you know what to do with, and everything that remains means your program should explode immediately.

:) problem is, there's no clear understanding of how one can avoid the possibility of unwanted error codes ever returned in his program, but yes
SQLite is not a flat file database, its an inprocess database that doesn't use flat files.
"As of 2011, one of the most popular flat file database engines is SQLite"

https://en.wikipedia.org/wiki/Flat_file_database#Contemporar...

"It is in Wikipedia" ⇏ "It is true"

SQLite files are not "flat files" by any definition commonly in use, including the "narrow" and "broader" definitions in that article. (Which, incidentally, is almost entirely unsourced fact claims, which violates Wikipedia's own quality standards.)

Great, you take issue with these things. Do you have a source you can point to? Could you use your time improving Wikipedia or at least pointing us poor saps on HN to more detailed information? That seems like it would be more useful than looking for a minor UTF-8 character to "prove" your point.

What are the definitions in common use that you know? Is it possible there are others you aren't aware of?

There is zero issue to have a flat file database that's all multi. MySQL InnoDB, the transactional one engine, can be a single file database. [Small exception - part of the schema is stored on two separates files for easier access, there won't be a big deal to store them in the big file. Transaction log is also not in the same file for obvious reasons]

If you want a full serialization of the memory to the 'flat' file - then no. It just makes no sense. Yet, a single file databases that's multi-user, mulch-threaded and mult-transaction is all viable.