Hacker News new | ask | show | jobs
by dalf 935 days ago
Seafile (a file sync storage) is inspired by git to store the files (internally there are repositories, branches and commits). However the file are not stored directly:

> A file is further divided into blocks with variable lengths. We use Content Defined Chunking algorithm to divide file into blocks.

> This mechanism makes it possible to deduplicate data between different versions of frequently updated files, improving storage efficiency. It also enables transferring data to/from multiple servers in parallel.

I use it on old PC without issue. Drawback: since the files are not stored in clear, in case of data corruption of the Seafile repositories, I need backup (never happened to me).

* https://manual.seafile.com/develop/data_model/

* https://pdos.csail.mit.edu/papers/lbfs:sosp01/lbfs.pdf

3 comments

And in keeping with the topic Seafile has a FUSE extension to access this storage system directly.

https://manual.seafile.com/extension/fuse/

Seafile is fantastic and I'm surprised I don't see more discussion about it around here. I've been running it on a VPS with MinIO as my object storage for about two years now, ~4TB of data just shy of 100,000 files. It syncs fast, stable af, and I "own" all my data. Can't recommend it enough.
How does Seafile's chunking compare to git's packfiles which can store binary deltas. [0] While git conceptually stores full files that doesn't mean that the the actual implementation isn't more efficient.

[0] https://git-scm.com/docs/pack-format