Hacker News new | ask | show | jobs
by Beltalowda 1535 days ago
Another issue with "just a JSON file" as a database is that you need to be a bit careful to avoid race conditions and the like, e.g. if two web pages try to write the same database at the same time. It's not an issue for all applications, and not that hard to get right, but does require some effort. This is a huge reason I prefer SQLite for simple file storage needs.
2 comments

A normal Express app (assuming it's one process per JSON file) shouldn't have that problem, because JavaScript is single-threaded
It can definitely be a problem in Node.js. Assuming the workflow is read from disk -> modify -> write to disk, and that you're using the async fs functions, two async code paths running at the same time will have last-write-wins semantics and will lose data.

That's the naive scenario. If all code paths write out a global data structure, then it'd be fine. Or if the file is written append-only instead of as a single, atomic data structure, then it could be fine.

You are confusing parallelism with concurrency. It definitely can be a problem.
Is it possible a write is interrupted on it's turn in the event-loop, and crossed with another?
Hmm. I wouldn't think so, but I don't actually know

Still, given the strategy at hand, the in-memory JS object (exclusively single-threaded) is the source of truth, and just gets mirrored in the file system (and doesn't get read again until the next startup). So you should have an eventual-consistency situation in the worst case (any racing issue between file-writes would just put the file in a stale state, and the next file-write would bring it back up to consistency)

Doesn't the fact that its opened in append only mode (Linux) mitigate data races with regards to writes?
Your write will be fine; that is, it's not as if data from one write will be interspersed with the data from another write. It's just that the order might be wrong, or opening the file multiple times (possibly from multiple processes) could be fun too. The program or computer crashing mid-write can also cause problems. Things like that.

Again, may not be an issue at all for loads of applications. But I used a lot of "flat file databases" in the past, and found it's not an issue right up to the point that it is. Overall, I found SQLite simple, fast, and ubiquitous enough to serve as a good fopen() replacement. In some cases it can even be faster!

> Your write will be fine; that is, it's not as if data from one write will be interspersed with the data from another write.

Are you sure? I thought it could be if the first write had more data than the size of the kernel/fs-driver buffer, not all of it would be written, and then it could be interrupted when another thread calls write() with a small buffer that gets written in one go.

No, I'm not sure haha; but in my experience it usually works like that, but no doubt there could be edge cases there, too. Another good reason to use SQLite.
Here is my list of numbers: 1,Here is my list of letters: a,b,2,3,d
Although not a POSIX requirement, in practice for unix-like systems, file writes are atomic across concurrent writers.

You maybe thinking of stdio buffering, where calls to printf etc get split into multiple write calls. Then in those cases, it's possible to get errant interleaved writes.

It eliminates them if they're smaller than PIPE_BUF (IIRC, Beltalowda, dmoy, and stevenhuang are wrong about this), but the thing that prevents data races with regard to writes is running the application in Node, which is completely single-threaded.