| I have so many questions about this. Much of the architecture seems off to me. I like the concept, but it doesn't seem as secure as it could be. For the README, I'd hope to find a bit more information about the way data is stored and transmitted. For example, this seems to just be a SQLite database with values in fields? Is there a separate encryption key for the database itself? Otherwise anyone with access to the file would be able to see all data stored? The encryption key is only used to encrypt data in transit, but not at rest? And then you're encrypting the full JSON blob instead of only the values? This seems risky to me. What is the purpose of the ProcessID? It is randomly generated and stored in the database (thus used by all clients too). So, I'm not sure what this is for? I see it's used to resolve conflicts, but these should probably be given out by the server? Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too. Finally, I don't understand why you're using plain HTTP (no TLS) for communication b/w client and server. I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data. This would have been a great use-case for a simple (non-HTTP/JSON) TCP server: >>> AUTHTOKEN xxx
>>> SET $KEY $LEN $SHA1
>>> <bytes>
<<< OK
>>> AUTHTOKEN xxx
>>> GET $KEY
<<< $LEN $SHA1
<<< <bytes>
Custom protocols have their own security issues, but it can also be easier to see where there are potential issues (like unmarshalling unvalidated blobs). If you wrap something like the above in TLS-PSK, you're set. If you want to use encryption for a session (after you authenticate), that's possible too, but you're at risk of effectively re-creating TLS. |
> this seems to just be a SQLite database with values in fields?
Sqlite is used as a storage format ("SQLite competes with fopen()"). The key-value pairs are stored as a modified Append-Only CRDT. The LUB-Operation (to merge to states while syncing) is implemented here: https://github.com/maxmunzel/kvass/blob/e32fdabdc86b039f716c...
> anyone with access to the file would be able to see all data stored?
Yes, attackers with access to your fs are not part of my attacker model. I rely on disk encryption for that matter.
> Do the clients cache data locally? It looks like you're basically syncing from the server for every request. You're already making a round trip to the server for a request anyway, so why not keep state only on the server? I can understand an offline-only mode, but this would require a significantly more robust sync mechanism. If this was the goal, I'd love to see this discussed more in the README too.
The sync mechanism is actually pretty solid, as its based on CRDTs. One of the applications of kvass is central management of config files, so automatic syncing and offline fallback are important.
> What is the purpose of the ProcessID?
The Counter Variable implements a rudimentary implementation of Lamport clocks. To get a total order from Lamport clocks, you need ordered, distinct process ids. The process id's don't really need to mean anything and the Lamport clock is itself just a fallback for the case that the wall-clock timestamps collide (see the Max() function), so it's practical to just draw them randomly.
> I didn't see any authn/authz in the requests. You're also unmarshalling random data from the request w/o confirming that it is valid first. This seems risky to me and could potentially crash the server if I were to send it random data.
Authentication is provided by the GCM mode of AES. As I decrypt (and thereby verify) early, I can assume to work on trustworthy payloads. GCM is also non-malleable unlike for example CBC or CTR.
As suggested by losfair, I'll switch to PSK TLS as soon as it's available or just put HTTPS in front of the end-points. But that's not high-priority right now.