Hacker News new | ask | show | jobs
CondensationDB: A general-purpose distributed database with end-to-end security (github.com)
94 points by Malexik 1972 days ago
8 comments

Wow - great to see more work in this area. And interesting to see that the Swiss Government is supporting.

We’ve been working on a graph/document database with these kind of collaborative revision-control features over at TerminusDB (https://github.com/terminusdb/terminusdb). I think it is the wave of the future. And interestingly we also came out of a European Commission backed project.

Our approach to distribution is to use delta encoding and succinct data structures. We borrowed a fair few ideas from Git. Might be interested to read our storage layer white paper: https://github.com/terminusdb/terminusdb/blob/master/docs/wh...

Conflict free merges sound fantastic - not an easy road! Good luck.

Thanks a lot, I heard about TerminusDB in the past, the capacity to do queries is interesting, that's something Condensation doesn't provide as it is on the client side and because at least now there are no tools to do it on the store.

I would be very happy to keep in touch and you have to give me the secret of these European funds.

I'm afraid the European monies are all spent! Was in a former life when we were a university progect called ALIGNED. But got us off the ground at least.

Very happy to stay in touch! We have a Discord server if you are interested.

This looks really neat, cannot wait to read the paper to see the details.

The other research db I saw recently I liked was sirix (https://github.com/sirixdb) - I can certainly see some room for cross-over between these two.

That's cool for querying, I will have a deeper look thanks
I'm not sure if I have fully gotten it correctly. It's a distributed database that syncs without conflict. So, cool for collaboration tools. What other things would you do with it?
Yes, it excels at synchronization, you could just put a server synchronized and you have your backup, or let each user of an application have his own server which is synchronized with others (e.g., for a smart lock system). That's something pretty useful for privacy or if you have connectivity problems like in a mesh network with interruptions.

For the app itself, its really about getting end-to-end encryption be able to use the app while offline without loosing data.

People that are interested in a similar feature set should check out https://github.com/attic-labs/noms and the SQL fork of Noms, https://github.com/dolthub/dolt
Interesting one, I think the difference is that you dont really control where your data is, it's like IPFS? I am surprised they managed to have queries, I will dig deeper into that thanks.
It's just a db, so your data is where the db is running. You could have a central db storing your data and your local client could sync periodically, or you can have a p2p architecture if you want. I use Noms as an embedded db in my project, where syncing happens between p2p nodes.
How does data synhchronization and conflict resolution work in CondensationDB? I expected to see something about OT or CRDT since it says it can be used for building collaborative applications such as Google docs (something like Docs is not possible without OT or CRDT).
Exactly, look my reply to Fiahil a bit below.
Can you tell us a bit more about the "conflict free merge" ? Is it based on CRDTs ? How does it work ?
Yes exactly, it's based on CRDTs and there is a strategy out there to mark the entries with a timestamp to figure out which one are the latest. An object may contain many entries and when they are read by the client they are just compared one by one to find the union, or the latest version.

The beauty of it is that the algorithm decide on how much entries to put in objects to ensure that only the data that is changed is sent on the network and compared on the other client. That's why we call it Condensation.

So, it's like a giant "grow only set" shared between as many nodes as you want and sending diffs on the network? I find it difficult to understand the possible applications* and the link with the article in the readme (https://www.inkandswitch.com/local-first.html).

*: I see that it could be used as backbone for an end-to-end encrypted messaging system, but what would it change for me if I were reading a remote API, running a multi-instance web app or parsing CSVs to train ML models on them ?

It's really not a grow only, there you have immutable data but it expires at some points regarding the rules you want to implement.

The integrity of the data is not shared on the network, each user owns his data, and choose to store it on his desired server. It's really like the email system.

For the article, it really joins the conclusion that you could build google doc, or an IOT system or anything but you will inherent from powerful synchronization, encryption, offline mode and so you can ensure the 7 principles the author of the article characterized.

You could just use a cloud to store massively your data, and pass them through a local server to make sure they are not compromised.

> It's really like the email system

Ah! I see where your inspiration is coming from! It's really interesting when you think of it that way!

I think there is a true need for a decentralized immutable data store (basically get/put/list operations), and it could serve a vast number of use cases. It enables simple algorithmic memoization, complete reproducibility of ML models, and datasets shareability. The only problem is we lack practical solutions. Maybe Condensation could help, or maybe not. In any case, it's good to see more alternative to traditional datastore that are not "git for data".

Yes immutable data stores are already working fine as you can see there https://condensation.io/stores/store/

And yes it really joins your point of horizontal scaling and sharing data in a controlled and secured manner.

Author and contributors are here for your questions
Hi, I tried running the example from https://github.com/CondensationDB/Condensation-java/tree/mas... with the HTTP store from https://condensation.io/servers/php-files/. However, it seems that the later is outdated. Is there a functioning store available?
Hey, can you send us an email? I can connect you to Thomas for the installation of the latest version. It will be a good exercise to guide you through.
What's the story for deletion? Is that done through something like merge(record, delete_token) -> delete_token?
Basically you have a document with all the references to objects, and if you remove the reference the object will be deleted after a certain timeout (you can set it for your specific case).
I find this quite interesting, but do all my development in the C/C++ ecosystem. Is there anything similar there?
http://doc.replicated.cc/%5EWiki/ron.sm

RON's reference implementation is C++. Op based sync, CRDT types inside. RON per se is the notation everything is built around.

Interesting one, do you know how this scale as the ID is set on each entry?
It has a bunch of tricks to compress the metadata, yes.
Not yet planned but its definitely something we want to have. we onboard everyone who want to port the code, so I hope someone will come with this idea soon.
May you introduce yourself, please? Cause the accounts look like throwaways. That's weird.
Yes, basically here is Thomas the main author https://viereck.ch/thomas/ and I am working with him since 7 years https://www.linkedin.com/in/alex-mouradian/

We used to work without being connected, I got your point we will put a presentation of ourselves in this github page or the website.

Interesting. Would love to try this out, but I generally try to avoid Java for personal side projects. Is the plan to make a Javascript client or a full Javascript port?
Thanks, yes it may start this week from the enthusiasm of a contributor we just met, he begin to help to port it to Typescript/Javascript.
Nice jobs