Hi, one of the developers here, there was no specific reason why we went mongo at the start, but there have been talks to move to a different database if needed. Currently mongodb is fine for us.
Just want to suggest that if you're interested in doing this in the future, some investment in defining interfaces up front is worth doing. I took a quick look at the codebase and it looks like mongo is "in there pretty good"[0] without any abstraction to make it easily shimmable.
Just a little specification around the that interface (Trait) will go a long way to making other backends possible and should make it much easier to know and manage the API contract a capable database must provide.
We (Discord) moved off of MongoDB for various reasons and are quite happy about that decision but managing Cassandra/Scylla clusters is not exactly a walk in the park either.
I didn't make the original decision but if I were starting something and I had no idea whether or not it'd be successful, I'd do whatever was the absolute fastest way to get to MVP. That'd probably be a cloud database, honestly -- but a modern MongoDB would be technically fine too (licensing stuff notwithstanding.)
Most startups fail not because they picked a suboptimal database for their usage but because they didn't build something that was good or it didn't achieve product market fit. I wouldn't worry about your database over-much in the beginning (unless it's critical to what you're doing and in that case, worry like hell, but you will probably know if that's the case.)
Many of Discord's issues with Mongo were exacerbated that we were using TokuMX which was abandoned shortly after we started using it. A few years into Discord we found ourselves with a rapidly scaling dataset and userbase that was built on top of an abandoned and not super popular third party version of MongoDB. (Funny story: at one point towards the end we realized that all of the packages had been pulled from every mirror we could find and literally the only place we could find the package files was off of some gov.uk mirror... that was a bad day. Thankfully we had the hashes and were able to validate the packages...)
FWIW, we did honestly debate moving our core user model (which was what was left in TokuMX by the end there) into a modern version of MongoDB -- some of the things we did (reverse indexes, secondary indexes, locking, etc) are much more complicated in a database like Scylla. It was tempting to just migrate the data from one "Mongo" to another and call it a day.
We didn't for a variety reasons, not least of which is keeping things simple by reducing the number of technologies you have in production (like when we chose to embrace Rust we went back and migrated nearly all of our Go systems).
Anyway, I'm pretty happy with not running MongoDB anymore, but not because MongoDB is inherently bad. It's popular for a reason!
Really appreciate this great, detailed answer! 100% agree with getting to an MVP with PMF as quickly as possible should be the top priority for a startup.
It is not "just fine". They do not have encryption on the wire by default and require an enterprise plan, which is percentage of income based, to enable it.
As I said, if one can afford it. Also not sure what you mean by wire encryption is on for enterprise plan and income based. Their plans are size based.
I worked with them and had meetings with them in person. SSO and TLS were Enterprise-only features, and enterprise pricing was percentage based. It was insanely expensive.
Interesting. Probably it's different now. We use both SSO and TLS and TLS is on by default now. May be they changed their plans since you spoke with them.
I'd probably recommend ScyllaDB for NoSQL. It's a replacement for Cassandra (which it's mostly compatible with) which is more or less the de facto standard for large NoSQL deployments at big companies, but it's written in C++ rather than Java so it's even faster (and has more consistent latency) and easier to deploy. And it's been around long enough at this point that it's established and not likely to just disappear.
It's a shame your devs don't like SQL. It's probably my most useful developer skill. Saves so much time elsewhere. Having said that, a messaging app that you really want to scale HUGE (like Discord or Facebook Messenger huge) is one place where the NoSQL solutions are justified.
I'd suggest you and your team shouldn't rule out SQL-like databases, given a lot of very competent NoSQL databases have SQL-like syntaxes (say Cassandra or ScyllaDB, what Discord went with). And regarding hiding it behind an ORM, if you want or need cream of the crop performance not only do you need to have chosen a database that fits your needs, but you will also need to occasionally work very close to the database to avoid abstraction inversion situations.
“Best” depends on your use case. I don’t even think a SQL database is an option for a chat app since it’s write heavy. FB Messenger uses HBase and Discord uses Cassandra and they’ve done their research at scale, so those could be possible options.
Which self hosting instance would go at the size of Facebook or Discord?
Just because Discord took a method doesn't mean a competing product should too.
People opting to use a meme "database" instead of a real DBMS is kind of a large red flag to me. I am hard pressed to imagine data more relational than chat messages and user accounts.
I refuse to believe this is due to technical reasons, as long as we are talking ACID compliant data storage. A well configured PostgreSQL will blow MongoDB out of the water while actually caring about your data.
Just a little specification around the that interface (Trait) will go a long way to making other backends possible and should make it much easier to know and manage the API contract a capable database must provide.
[0]: https://github.com/revoltchat/delta/blob/master/src/database...