Hacker News new | ask | show | jobs
by CamperBob2 1063 days ago
1988? As far as anyone can tell, the use of the term "shard" in the context of database replication originated with Ultima Online, which was released in 1997, and which used the term in connection with its underlying mythos (the idea of representing world instances as shards of Mondain's shattered gem).

So a documented reference to sharding that's earlier than that would be interesting to see.

(Disagree? Instead of downvoting, consider posting a citation that actually resolves to a real paper.)

2 comments

You can find reports from before 1988 mentioning SHARD being in development, like the one from June 1986 linked in this sister comment: https://news.ycombinator.com/item?id=36849634
Some more, from 1989[1][2][3]. Which again, reference the missing "SHARD" paper, but contain enough detail to make it clear that the idea of SHARD existed, regardless of the status of that particular document.

[1]: https://apps.dtic.mil/sti/tr/pdf/ADA214478.pdf

[2]: https://apps.dtic.mil/sti/tr/pdf/ADA216523.pdf

[3]: https://apps.dtic.mil/sti/tr/pdf/ADA209437.pdf

(I didn't downvote your comment)

"SHARD" is the name of the software - it was common back then to name systems using acronyms. It's not clear whether the paper/report actually uses the term "shard" in the sense that it is now used in distributed systems, or even whether it uses it at all.

One of the related papers I stumbled across, while not the SHARD paper, does go into a fair amount of detail about SHARD and the problem they were trying to address. One bit of verbiage here might be illuminating:

The new SHARD) (System for Highly Available Replicated Data) system under development at Computer Corporation of America (CCA) is designed to address the problems described above. It provides highly available distributed data processing in the face of communication failures (including network partitions). It does not guarantee serializability, nor does it preserve integrity constraints, but it does guarantee many practical and interesting properties of the database.

The reader is referred to [SBKJ for a detailed description of the architecture of the SHARI) system. Briefly the main ideas are as follows. The network consists of a collection of nodes, each of which has a copy of the complete database. (Full replication is a simplifying assumption we have used for our initial prototype, many of our ideas seem extendible to the case of partial replication, but this extension remains to be made.) Replication allows transactions to be processed locally, thus reducing communication costs and delays, and providing high availability.

So it sounds to me like their main concern was availability through replication, and not so much horizontal scalability (which seems to be more the "point" of modern day "sharding"). Yet I would probably claim that there is enough conceptual overlap to say that SHARD does relate to the modern use of sharding in some sense. Although it's hard to be sure without that original paper.