Hacker News new | ask | show | jobs
by billconan 2252 days ago
I want to build a reddit like community using orbitdb. But orbitdb can't freely add and remove user permissions. When will this feature be implemented?
2 comments

This is an open problem. It might be surprising to find out that it's quite difficult.

CRDTs usually work as last-write-wins, meaning that if you have a key-value store, the last update to update a key 'wins' the value via the way oplog reduction works.

If you reverse that to a FIRST-write-wins log, you can grant permissions and ownership on a first-come, first-serve basis. Revocation, then, becomes the issue. What do you do with the records they already have? Questions like that are plentiful.

The approach most people take is to find workarounds or "good enough" solutions here, either by using encryption and allowing the encrypted data to be public, or by using some sort of other OrbitDB store as their ACL and management, and only giving select keys access to write to said ACL store in the first place.

Adding encryption into the mix though, particularly multi-writer, becomes exponentially harder.

> CRDTs usually work as last-write-wins

Um.

Ah, sorry, yes I stepped on a rake and whacked myself in the face here.

What I meant, since LWW is nomenclature for an alternative to CRDTs, is that the last writer by _logical clock_ in a CRDT, not by _wall clock_ time, will "win" the key.

> LWW is nomenclature for an alternative to CRDTs

I think you're still stepping on rakes...

Can you elaborate?
> Last-Writer-Wins is a conflict resolution strategy that can be used by any kind of data type that needs conflicts resolved, CRDTs included. Unfortunately it's not a very good one: even if you use vector clocks instead of wall clocks, it doesn't give you much stronger guarantees than determinism. That is, given two concurrent writes, the winner is essentially arbitrary. LWW is a merge strategy of last resort; if that's the only thing your CRDT system offers, I'm not sure it's really fair to call it a CRDT system.

Can't reply to the comment below, so replying here.

I believe what markhenderson was trying to say is that in OrbitDB, the default merge strategy for concurrent operations is LWW.

The comment above is conflating a lot of things here. 1) determinism is exactly the guarantee one needs for CRDTs, and I'd argue generally is a good thing in distributed system but 2) adding vector clocks (OrbitDB uses Lamport clocks, or Merkle Clocks [1], by default), nor wall clocks, have nothing to do with determinism and in fact there's a good reason to not use vector clocks by default: they grow unbounded in a system where users (=IDs) are not known. In my experience, LWW is a good baseline merge strategy.

I don't think it's at all correct to say that "the winner is essentially arbitrary" because it's not. The "last" in LWW can be determined based on any number of facts. For example "in case of concurrent operations, always take the one that is written by the ID of the user's mobile device", or "in case of concurrent operations, always take the one that <your preferred time/ordering service> says should come first". It'd be more correct say "the winner is based on the logical time ordering function, which may not be chronological, real world time order".

As for the last comment, I'm pretty sure it's a CRDT system :) Want to elaborate your reasoning why you think it's not a CRDT?

[1] "Merkle-CRDTs: Merkle-DAGs meet CRDTs" - https://arxiv.org/abs/2004.00107

OK, I've read the paper; can you help me reason through a scenario?

As I understand it, the Merkle-CRDT represents a Merkle tree as a grow-only set of 3-tuples. When you add a new event to the thing (as a tuple) you have to reference all of the current concurrent root nodes of the data structure, in effect becoming the new single root node; and your event data, which must be a CRDT, gets merged with the CRDTs of those root nodes. Do I have it right so far?

Assuming yes, let's say you have causality chain like so:

    1 --> 2 --> 3 --> 4 
           `--> 5 --> 6
Two root nodes, 4 and 6. Two concurrent histories, 3-4 and 5-6. It's time to write a new value, so I create a new tuple with references to 4 and 6, and merge their CRDT values. Last Writer Wins, right? So either 4 or 6 dominates the other. Whoever was in the other causal history just... lost their writes?
almost! :) let me elaborate on few points.

> you have to reference all of the current concurrent root nodes of the data structure, in effect becoming the new single root node

correct, and more precisely the union of heads is the current "single root node". in practise, and this is where the merge strategy comes in, the "latest value" is the value of the event that is "last" (as per LWW sorting).

> and your event data, which must be a CRDT, gets merged with the CRDTs of those root nodes.

the event data itself doesn't have to be a CRDT, can be any data structure. the "root nodes" (meaning the heads of the log) don't get merged with the "event data" (assuming you mean the database/model layer on top of the log), the merge strategy of the log picks the "last/latest event data" to be the latest value of your data structure.

> It's time to write a new value, so I create a new tuple with references to 4 and 6, and merge their CRDT values.

when a new value is written, correct that the references to 4 and 6 are stored, but the new value doesn't merge the values of the previous events and rather, it's a new value of its own. it may replace the value from one or both of the previous events, but that depends on the data model (layer up from the log).

  1 --> 2 --> 3 --> 4 
         `--> 5 --> 6
> Last Writer Wins, right? So either 4 or 6 dominates the other. Whoever was in the other causal history just... lost their writes?

no writes are lost. the result in your example depends what 4 and 6 refer to. in a log database, the ordered log would be eg. 1<-2<-3<-5<-4<-6, so all values are preserved. in the case of a key-value store, it could be that 4 is a set operation to key a and 6 is a set operation to key b, thus the writes don't effect each other. if 4 and 6 are both a set operation on key a, it would mean that key a would have the value from 6 and the next write to key a would overwrite the value in a. makes sense?

Last-Writer-Wins is a conflict resolution strategy that can be used by any kind of data type that needs conflicts resolved, CRDTs included. Unfortunately it's not a very good one: even if you use vector clocks instead of wall clocks, it doesn't give you much stronger guarantees than determinism. That is, given two concurrent writes, the winner is essentially arbitrary. LWW is a merge strategy of last resort; if that's the only thing your CRDT system offers, I'm not sure it's really fair to call it a CRDT system.
You can build this using 3Box, which has extended OrbitDB with DID-based access control system and user permissions. Check it out here: https://docs.3box.io/build/web-apps/messaging

3Box also has support for members only OrbitDB threads which can restrict posting to members, and encrypted OrbitDB threads to make posts private to the group. To Mark's point above.

That's intersting! In the case of persistent threads as mentioned in the link, can the set of moderators be mutable, and still have eventual consistency?
Yes, the list of moderators is mutable. It's addition-only out of the box, but to create a system where removing moderators is possible, you can create a new thread with the new set of mods (minus the one you removed) and the first entry in that new thread can reference the old thread. This model gives the new set of mods forward control, but the old content will still have the old set of mods.

This also works for encryption in members threads. To remove a member, you can create a new thread without the member and link the original thread. This gives forward secrecy since new encryption keys are generated for the new thread.

We're working on improving this system over time, but it requires some more advanced cryptography such as proxy re-encryption (like nucypher).

There ya go!