Hacker News new | ask | show | jobs
by dwenzek 1223 days ago
I just found this quite old paper and it came as a surprise to me to discover that the idea of append-only storage is not 20 years old but more than 40!

The older work I was aware of is on "The design and implementation of a log-structured file system" (1)

So this is with pleasure that I learned that these ideas was around in the 80:

- Deletion considered harmful

- A non-deletion strategy using timestamps

- The importance of accessing past data

- A non-deletion strategy can improve both integrity and reliability

(1) https://dl.acm.org/doi/10.1145/146941.146943

7 comments

Sadly 1992 is 31 years ago. The authors pushed for log structured filesystems in an earlier paper in 1988 : Beating the I/O Bottleneck https://www2.eecs.berkeley.edu/Pubs/TechRpts/1988/5760.html . It was inspiration for many storage appliances, NetApp probably being a very strong example.

Though many were thinking about these ideas in the 88-92 timeframe, as Tape storage systems are roughly speaking append only, so lots of the ideas of a logged filesystem are around the increased random read from disk drives.

A non-deletion strategy should consider including an encryption and key management strategy to enable retroactive secure deletion without impacting availability, reliability, and performance. This seems to be missing from a lot of systems that deal with personal information.
Absolutely. And not only that, but you need storage that you can delete the keys from, even if the primary storage is append only. It may sound like a trivial detail, but shredding gets harder and harder for every new layer of “smartness” that SSDs and file systems provide for convenience.
also called crypto shredding! We had this issue trying to square GDPR-type things with an append-only store.
Paper-based accounting was append-only, so I think the idea's always been there but was uneconomic in machine readable media for a long time.

(in particular, "new master = old master + updates" card/tape jobs were in principle append-only but —due to finite number of tapes— in practice overwriting)

Not at all! Medieval documents were routinely washed and reused.
I don't think that was their point. If you went to a bank in the 1940s and made a withdrawal, they wouldn't pull your account slip, erase the balance, and write in the new one. They would add a new line to the ledger noting a new balance. This is by design.
Double-entry bookkeeping was a great advance.
And -- in keeping with the flow of this thread -- a complete luxury until the implements (paper/pens/tape/disk/silicon) would become abundant and ubiquitous.
Paper and ink were good enough for centuries.
A reused manuscript page is called a palimpsest and often the scraped off text can be recovered. Some of Archimedes writings actually survived this way https://en.wikipedia.org/wiki/Archimedes_Palimpsest
Obligatory shout-out to the novella of the same name, by HN user (and, obviously, Actual Author) Charles Stross.

[1]: https://en.wikipedia.org/wiki/Palimpsest_(novella)

More recently, movie studios regularly destroyed old film to make space for new ones, causing old pictures, especially silent era ones, to be lost forever.

Even NASA wiped the original Apollo 11 tapes to reuse them.

This enrages me every time I see it. It's even worse with old broadcast TV, Especially daily serials/soaps. Shit is probably gone forever because the tapes were immediately wiped and reused. >:(

I know a lot of it stemmed from space and budget constraints but the complete lack of forward thinking everyone seems to have drives me bonkers.

I'm glad the Internet Archive exists. This current digital only future is terrifyingly ephemeral.

I really wish someone would give the Archive a massive, multi-billion endowment to guarantee it surviving for decades of operation with no forthcoming income.

And wax tablets, slates, etc. Pedantically, I suppose neither these nor parchment is really paper.
The topic of dealing with history in databases seems to go most of the way back to the beginning of the field. I'm still hoping a copy of "Bubenko (1977) The Temporal Dimension in Information Modelling" turns up on the web eventually as I'd love to read it.

The 1980 paper you linked is touched on briefly at the beginning of this Strange Loop talk on "Light and Adaptive Indexing for Immutable Databases (2022)": https://www.youtube.com/watch?v=Px-7TlceM5A

The no-overwrite storage architecture of Postgres from 1985 also took advantage of optical write-once read-many (WORM) drives developed in the late 70's.
the idea of append-only storage is surely older than pacioli
I'm fairly certain data and records have been sewn into tapestries for thousands of years.