> 1. What to do with stale user data? What happens if a user doesn't open the app for a year? How do you handle migrations? version = db.query("select value from config where key='version'").fetch_one()
switch (version) {
case 1:
db.migrate_to_version_2()
fallthrough
case 2:
db.migrate_to_version_3()
// ... and so on
}
assert(version == 3)
start_sync()
Just don't delete the old cases. Refuse to run sync if device is not on the latest schema version.One of my Django projects started in 2018 and has over 150 migration files, some involving major schema refactors (including introducing multi-tenancy). I can take a DB dump from 2018, migrate it, and have the app run against master, without any manual fixes. I don't think it's an unsolved problem. > 2. What about data corruption? What happens if the user has a network interruption during a sync? How do you handle partial states? Run the sync in a transaction. > 3. What happens when you have merge conflicts during a sync? CRDT structures are not even close to enough for this. CRDTs are probably the best we have so far, but what you should do depends on the application. You may have to ask the user to pick one of the possible resolutions. "Keep version A, version B, or both?" > 4. What happens when the user has millions of items? How do you handle sync and storage for that? Every system has practical limits. Set a soft limit to warn people about pushing the app too far. Find out who your user with a million items is. Talk to them about their use cases. Figure out if you can improve the product, maybe offer a pro/higher-priced tier. > Mobiles are really bad with memory. iOS and Android have insane level of restrictions on how much memory an app can consume, and for good reason because most consumer mobile phones have 4-6 gbs of RAM. You don't load up your entire DB into memory on the backend either. (Well your database server is probably powerful enough to keep the entire working set in memory, but you don't start your request handler with "select * from users".) You're asking very broad questions, and I know these are very simplistic answers - every product will be slightly different and face unique trade-offs. But I don't think the solutions are outside of reach for an average semi-competent engineer. |
For structured data, with compound entities, linked entities, both, or even both in the same entity, that can be a lot more complicated.
If a user has updated an object and some of its children, is that an atomic change or might they want the child/descendent/parent/ancestor/linked updates to go through even if the others don't? All of them or some? If you can't automatically decide this (which you possibly can't in a way that will satisfy a large enough majority of use cases) how do you present the question to the user (baring in mind this might be a very non-technical user)?
Also what if another user wants to override an update that invalidates part/all of their own? Or try to merge them? Depending on your app this might not matter (the user might always be me on different devices, likely using one at once, that is easier to understand than the user interacting with others potentially making many overlapping updates).