Hacker News new | ask | show | jobs
by throwaway_bad 2423 days ago
Having client state just be a replica of server state solves so many problems I don't understand why the concept never caught on. Pouchdb/couchdb are still the only ones doing it afaik.

Instead we have a bajillion layers of CRUD all in slightly different protocols just to do the same read or write to the database.

11 comments

When your data is public and immutable, this approach is very pleasant. The client becomes just another caching layer and worst case it's presenting a historical version of the truth. You can even extend this across tabs with things like local storage.

This breaks down quickly once you have data that could become private or mutate rather than append.

This what stopped me exploring couchdb further.

The "one db per user" model for private data made using other features like views etc more difficult when you have to upgrade,edit,remove them.

Mutability wasn't really a problem, either present the conflicts to user and pick one or write code to merge if possible.

Couchbase (with its sync gateway) uses "channels" to sync data. It even lets you change the channel of a doc, and to the sync client it shows as if the doc was deleted (if the user is not part of the new channel)
Yeah, couchdb more or less requires you to replicate data for individual users if you need complex permissions and want the user to access the couchdb directly.

Permissions in general need to be handled by custom reconciliation functions (dropping unauthorized changes) or some kind of nanny system that can react to changes.

For example, imagine blog posts as documents, and a list of comments inside that document. Instead of the user adding/changing the comment list, the user would add a record to a comment request list, and either the reconciliation process or a nanny service checks the requests and updates the comment list.

The much simpler solution of course is to not let the users have any write access to the couchdb and just use a REST API. But then you loose much of the benefits of couchdb...

I think the Firebase Realtime Database and Firestore have a good model for offline and being able to have private and mutable data. It does get complex but the Firebase SDKs do the heavy lifting for you here.
Can you share more about the private and mutable data functionality for Firebase? I used them a few years back, and never really understood how to do private data without building my own ACL inside Firebase.
Here is a couple of specific examples: https://medium.com/firebase-developers/patterns-for-security... https://firebase.google.com/docs/firestore/solutions/role-ba...

You do need to have you ACL data also stored in the database, which can be a hassle if you have existing ACL system already built outside of Firebase.

xrd, do you have a specific question about private data in Firestore?

Thanks for the response. I suppose my biggest question is not how I can store private data (I can just make it inaccessible via the right rules). But, it seems like I am then layering my own ACL system onto those rules. And, I never got a sense there was an easy way to write a test that simulated my rules against my data and made sure I was not accidentally creating a leaky rule.

In so many ways it is SO much easier to use Firebase because all the pieces are right there as compared to a DB + Server + Front End + Tooling. But, I still always worried that I would somehow leave a gaping hole in my data and not know about it.

And, I was never really sure how I can easily do joins across data without writing my own bespoke metalanguage inside Firebase. A link posted today on HN talked about XML does turn out to be good for nested data (hence the reason it is used for UIs), and it feels like Firebase being more or less JSON loses in this respect.

That's just my experiences, and I say that loving Firebase.

Those two examples made me think: Firebase removes a lot of complexity for me, but it forces me to write my own layer of complex access and DB logic which I never felt fully qualified to do, and as such, just went back to using databases with an ORM and a backend server.

In reality data can't become private again, after being available. You may try to contact all users to delete their copy, but they may not respect that.
That's in theory. In actual reality, if you're making an app that has private data and is not crawled by bots, most of the time users don't save everything that they see.
I think the parent means clients (user agents), not actual human users. As soon as any state that needs to be hidden is exposed to clients, that's a security breach regardless of whether any human eyeballs have seen it.
This is true, too, but I meant users, too. It's all too common for users to screenshot things, etc. I see it a lot of the time on Twitter for example. You can delete tweets, but oftentimes it's pointless. But I also regularly see my gf taking photos with her phone of various apps she uses on her notebook, just to have the info around on her phone. I suspect it's pretty common, because it's much more low-tech then saving pages or API scraping.
In short, server data is more normalized than the data client needs. As you get closer to view layer, your data gets denormalized further and further. Client-server interaction sits somewhere in the middle to both minimize the bytes-over-wire as well as the round-trips to the backend to get up-to-date.

Take a look at GraphQL, its central promise is to let client choose what's the optimal data it needs (that often denormalized through nested GraphQL queries), and send it in one batch.

It is not to say there shouldn't be a simple replica. It is just if we want it to be a simple replica, we should have a server-side mirrored some-what-denormalized representation rather than just the raw server-data models.

Data security is a huge issue, Facebook.com has very specific whitelisted access patterns encoded as CRUD endpoints. 1) Users want to make sure their data is used in non-creepy or non-stalky ways; 2) Facebook's business needs to control the access point so they can serve you adds or otherwise monetize. So the API exposes only limited access patterns, the API TOS disallows caching, and they go to great lengths to prevent scrapers.

If immutable fact/datom streams with idealized cache infrastructure becomes a thing (and architecturally i hope it does) it's going to need DRM to be accepted by both users and businesses.

> Having client state just be a replica of server state solves so many problems I don't understand why the concept never caught on.

As soon as your server state is larger than whatever your client can handle, the whole metaphor breaks down.

I'm assuming that the client state would be a lazy representation of the back end state, that only pulls data as needed. The result being that local and server state must both be treated only as asynchronously accessible.
Firestore is doing something very similar and it is very easy to use.
There is a limit on what data can be replicated with the client. For a simple chat-app you can replicate all messages of a user. But you would never replicate the whole state of wikipedia to make it searchable.
MeteorJS tried it and failed miserably. It doesn't scale.
Wrong. This is popular anti-Meteor FUD spread by people who don't know how to use its features properly or have the engineering/computer science background to design a system to be able to manage computational complexity or scalability.

In 2015, my business implemented a Meteor-based real-time vehicle tracking app utilising Blaze, Iron Router, DDP, Pub/Sub

Our Meteor app runs 24hrs/day and handles hundreds of drivers tracking in every few seconds whilst publishing real-time updates and reports to many connected clients. Yes, this means Pub/Sub and DDP.

This is easily being handled by a single Node.js process on a commodity Linux server consuming a fraction of a single core’s available CPU power during peak periods, using only several hundred megabytes of RAM.

How was this achieved?

We chose to use Meteor with MySQL instead of MongoDB. When using the Meteor MySQL package, reactivity is triggered by the MySQL binary log instead of the MongoDB oplog. The MySQL package provides finer-grained control over reactivity by allowing you to provide your own custom trigger functions.

Accordingly, we put a lot of thought into our MySQL schema design and coded our custom trigger functions to be selective as possible to prevent SQL queries from being needlessly executed and wasting CPU, IO and network bandwidth by publishing redundant updates to the client.

In terms of scalability in general, are we limited to a single Node.js process? Absolutely not - we use Nginx to terminate the connection from the client and spread the load across multiple Node.js processes. Similarly, MySQL master-slave replication allows us to spread the load across a cluster of servers.

For those using MongoDB, a Meteor package named RedisOplog provides improved scalability with the assistance of Redis's pub/sub functionality.

Hey Vlad, very cool to hear you're still using this stuff. I've seen the notications on the latest with Zongji. [0] Kudos for keeping it going!

https://github.com/nevill/zongji

> RxDB can do a realtime replication with any CouchDB compliant endpoint and also with GraphQL endpoints.

PouchDB is mentioned a couple of times, including one “PouchDB compatible” mention. Wondering what unique use cases RxDB supports?

That's exactly the model that Apollo Client library uses (GraphQL-based data store for react), and teams I spoke that tried it are quite enthusiastic for this reason.
How would that work, with, say Twitter? Copy all tweets ever to each browser?

I mean, that's why it didn't catch on right :-) It's hard :-)

Same way you scale replication for any server, by sharding and only replicating the shards you care about.

The "shard" could just be that users own feed in this case. Then you get offline for free where user adds a tweet and it appears immediately, replicating back to server when he goes back online. The server replica side will need to be a lot more complicated to deal with broadcasting but I don't see why it won't work.

I attempted something similar on a current project, the problem is with inital data loading. If you are hitting the URL/Page for the first time you are waiting minutes or more for non trivial data sets.

Why not just load what is needed and hydrate the data over time? What about datasets where you need pagination/ordering etc. And the only way to guarantee order is to pull the whole set?

In twitter-like applications, the default ordering is usually just ORDER BY timestamp DESC. You could rely on this default ordering to load the first few dozen items on first visit, and load the remainder asynchronously. Sort of like automatic infinite scrolling.

Of course, users with limited RAM and metered connections won't like that. Which is another reason why it didn't catch on.

I've heard somewhere that Twitter maintains a copy of all tweets in a user's timeline for that user.

If I had to make a Twitter clone with CouchDB, I would probably have one timeline document per user, and maybe one per day to limit the syncing bandwidth.

Security is a huge deal