Hacker News new | ask | show | jobs
by mbalex99 1402 days ago
My startup built an SDK framework that’s allows companies to build offline apps that can even sync over a multihop mesh network. It’s powered by a CRDT to handle the conflict resolution.

From my experience most of our customers are not mobile developers (they’re JavaScript developers asked by their bosses to build mobile apps) and don’t even want to spend time learning coredata or room for iOS or Android. So most of the time they just stick to what they know, HTTP requests.

Luckily we made our SDK easy to use so that these JavaScript developers can get both the network communication and the offline first caching in one product. They absolutely love it.

Most apps aren’t offline first because it’s so hard to build the infrastructure to pull it off without a lot of bugs. Most apps focus entirely on the UI code layer. If infrastructure and frameworks made this a lot easier to use, I bet offline first would be a lot more popular and users would be a lot happier.

2 comments

Are there similar services to your company's (https://www.ditto.live/) out there already? I haven't had the need for this yet, but I can see your product being pretty damn useful
Ditto is the real deal and the best CRDT database out there atm.

There are expressive CRDT libraries like Yjs and Automerge but these don’t give you persistence.

There are mature databases out there that include CRDTs, like Redis and Postgres EDB, but these have very limited semantics.

Ditto is a proper database with a rich set of CRDTs. It’s well engineered, the team are great and it has a compelling DX / realtime APIs.

Source: working with CRDTs at Vaxine.io

Thanks so much for the kind words. We have a huge set of write ups and presentations on our delta state CRDTs coming soon! Stay tuned!
I am sure it is but the good old sign up and we’ll tell you how much this magic costs is an marketing pattern that I don’t trust. If you can’t be transparent about price I don’t feel confident in giving you my info.
Regarding pricing, we are actively working on a general available pricing for our entire system. This is going to take us a couple of quarters to operationalize everything; we have a pretty good idea of what the pricing will be for the everyone-version.

Right now we have our enterprise pricing nicely figured out. This might sound a bit strange but unlike many startups, we actually went to market backwards by focusing on the enterprise first. When we started the company, we had this crazy idea that we could build a distributed, query replicated, database that could sync over a mesh network. No one had ever built anything like this and we had some skepticism that such a product could actually generate revenue to create a sustainable business. Eventually, most infrastructure or database startups survive in large part to enterprise deployments. So we sought enterprise use cases and dollars first. And I have to say that was totally worth it and we've been rapidly gaining customers here.

However, in our hearts, we are all frontend web, mobile, IoT app developers that all wanted to build collaborative apps like iMessage, Figma, Miro, Trello that had offline, sync, and conflict resolution capabilities out of the box. Once we make a couple of product and ops advancements over the next could of quarters, we will have very clear pricing like any other service out there. So just hang tight!

Want us to get to pricing faster? Definitely recommend some of your infrastructure, kubernetes, and rust distributed systems developers to join us. We are hiring hardcore!

If you know little about a potential client's use case, how do you price the product?
Regarding similar services? Kind of!

Ditto is both an embedded + cloud database + a mesh network, it's really 2 startups in 1.

A) If you're looking for just offline-first and data sync, there are some company's that have done this

1. Realm (MongoDB) - this is the company that my CoFounder and I came from. 2. Firebase (Google) - one of the biggest inspirations for me personally in data sync. 3. Supabase - a very popular growing open source Firebase alternative 4. All the GraphQL Backend-as-a-Service like Prisma, Hasura etc... These have offline caching with a lot of the GraphQL client libraries. I don't think actually have a database underneath the hood that you can query.

B) If you're looking for just mesh networks:

1. Build it yourself using Bluetooth Low Energy, Local Area Network, P2P Wi-Fi Direct, Apple Wireless Direct, Wi-Fi Aware APIs that come with most of your device frameworks. Build an advertising system, a common communication protocol, and add your identity security system. If you want multi-hop, you'll need to create a dynamic routing and presence system on top of it. After that design an API to send data around, respond to errors. If you want offline-first you should research CRDTs and try to build a database replication system using the mesh network. 2. You could use Apple's Multipeer Connectivity framework: this is iOS, MacOS devices only. No multi-hop here but you can build a system on top if it. One thing I've noticed is Apple's framework is a ruthless battery drainer. My phone gets very hot after a minute. It doesn't look like it uses Bluetooth Low Energy and it's advertising system seems to be extremely aggressive 3. Google has an abstraction called Nearby Messages that uses Bluetooth Low Energy. It isn't very stable but you could try to trick it to re-establish connections. After that you'll want to investigate how to pull off multi-hop. It's the same as step 1 and 2 https://developers.google.com/nearby/messages/overview 4. There was a company called Hypelabs that offered mesh network solution, but not the offline-first part. I'm not sure what's up with them 5. There's another company called Bridgefy https://bridgefy.me/ that built a chat app used in some of the Hong Kong protests 6. Open Garden also had Firechat in 2014

Ditto is a combination of both families of problems, it's basically creating 2 startups at the same time (mesh + distributed database):

* Offline first embedded mobile, web, IoT database called the small peer * A large distributed database in the cloud called the Big Peer (this is new and what we need to operationalize for general avaiability pricing) * A replication engine that uses our mesh network powered by Bluetooth Low Energy, Local Area Network, P2P Wi-Fi Direct, Apple Wireless Direct, Wi-Fi Aware

The problems that we have to tackle are so crazy; network optimizations, compression, multi-plexing, conflict resolution, scaling on the edge and cloud etc.... It's like the product that we're trying to create is teaching us as we build. For example one of the challenges that we have now with multi-hop is scaling performance. A large mesh of 1,000 devices may chatter so much just on the distributed routing table that it can cripple the replication of the actual data! So we are trying novel ways to dynamic route data by also incorporating special characteristics of CRDTs so that chatter is reduced and performance increases. Other major things we will improve are ways to prevent denial-of-service attacks even with trusted actors, decentralized access control of data, graph centrality theory etc...

Regarding use cases?

1. Well anything that's latency sensitive is perfect for us. Think controlling robots, syncing whiteboard pen strokes across devices, games, VR+AR. 2. Industry wise, any place where _any_ issue to internet connectivity means a loss of money, life, user experience: aviation, hospitals, point of sale, education, manufacturing, defense. A lot of our customers have internet 99.9% of the time but even that 0.1% is a nightmare that causes great issues.

Hey, Nikolas from Prisma here.

> 4. All the GraphQL Backend-as-a-Service like Prisma, Hasura etc...

Just wanted to quickly drop in to clarify that Prisma is not a GraphQL-as-a-Service tool any more but an ORM that gives you a type-safe JavsScript/TypeScript client for your DB and a migration tool. The main differences between Prisma 1 and the Prisma ORM (i.e. Prisma 2+) are explained here: https://www.prisma.io/docs/guides/upgrade-guides/upgrade-fro...

Ah sorry! :-)

You see how hard it is to change perceptions? This is why people spend so much time on branding for developer tools, databases, and infrastructure companies.

* It’s powered by a CRDT to handle the conflict resolution.*

Can you go into more detail here?

Sure! In our documentation we go over some of the details here: https://docs.ditto.live/javascript/common/how-it-works/crdt

I'm not sure if you know what CRDTs are but they're a family of data types that allow different actors in a distributed system to edit data concurrently even during network partitions. If enough data is shared, they will deterministically agree on the same value. They kind of give that "google docs" behavior if you're looking for an analogy. They're perfect for peer to peer and offline-first systems.

However there is actually more to it, and a much more detailed write up is coming soon. Ditto is a distributed database, each peer has it's own database. The database is organized into collections and each collection is a Ditto Document (this does not work like most NoSQL document databases). Each property of the document is it's own CRDT, you as the user can pick which CRDT you'd like to use, our current catalog includes:

* Registers (Causal Last Write Wins) * Counters (sums of each writer's numeric values) * Binary Attachments (same as a Register but you can put large arbitrary data like say video files, images, PDFs whatever) * AddWinsMaps (coming soon) - This type allows for concurrent upserting and removing of values based on a key. * ReplicatedGrowableArray - this is an array type that allows for concurrent insertions while preserving some semblance of order. It behaves rather closely to a collaborative text editor merge behavior.

Our AddWinsMap and ReplicatedGrowableArray are more special than you might think. They can actually host nested CRDTs. Think of it like a folder within a folder in Google Drive that can hold synced documents nested within.

I'd love to show you over perhaps a call! We tend to be perfectionist when it comes to documentation and have been so busy that we haven't fleshed it all out. Perhaps we might just open source our CRDT system.

Email is in my profile, I love chatting and sharing about this stuff!

Reformat the list, please, HNs parser doesn't like it.
Email sent :)
A CRDT is a way to solve multi-leader replication without having the application code resolve conflicts.

This is what is required to build an app where any instance (node) can be offline for an arbitrary amount of time, but still be able to share state with the rest of the nodes when it's reconnected.

To implement this, every application node keeps a vector clock per register (an atomic piece of shared state). The vector clock allows any node the compare its own version of the register with the state received from any other node. Two values of a vector clock can either be causally related (in which case the most recent write wins) or concurrent. However the concurrency is from the system's perspective, but not necessarily from the user's perspective. An extra physical timestamp can be kept at the register level to order concurrent updates in a way consistent with the user's time perception.

Now, having the hybrid clocks in place to version each register on each node, the system must implement a protocol to ship every register update to all nodes (reliable broadcast).

Once all updates are shipped to all nodes, it's guaranteed that all nodes have the same (most recent) state.

(I built an offline-first product and had to roll my own protocol)

Nice writeup.

I've read the CRDT paper but never implemented it. Question - if you're not using LWW (instead you have concurrent values of a vector clock), this is where you have your CRDT and merge the states coming from every node?