Hacker News new | ask | show | jobs
by cupcakestand 3252 days ago
Just about to decide between Mongo and Rethink. I know HN is not on good terms with Mongo but I'd like to have insights on...

- Setup of a replica set: is it easier/faster than with Mongo (which I find complicated)?

- Sharding: Easier than with Mongo?

- Rethink's query language: Is it really better than Mongo's after you used it for some time?

Happy to hear more insights if you know more noteworthy stuff and please no bashing of any of them. Just try to make an educated decision.

4 comments

Former rethinkdb developer here. Sharding and replication just work out of the box. Connect the servers to each other with "rethinkdb --join otherserver", then use the web dashboard or the client APIs to specify how many shards and replicas for each table. Send queries to any server in the cluster and they get routed automatically.
> Send queries to any server in the cluster and they get routed automatically.

Nice! Also the writes? Mongo wants the writes to the primary only.

Yes, if you send a write to a server that's not the primary then rethinkdb will transparently forward it to the primary.
The problem with a feature comparison between Mongo and another database is that feature comparisons rarely include the features that Mongo lacks. Mongo looks good if you ask about sharding or replicating, but Mongo looks terrible if you ask about losing data or leaking memory. But nobody asks about losing data or leaking memory because they reasonably assume that a popular data store wouldn't do those things. That assumption, while reasonable, would be wrong.

Think of this like Maslowe's hierarchy of needs. Sharding and replication are up at the top with self-actualization stuff like job satisfaction and feeling love toward animals. Meanwhile Mongo is missing the real basic stuff, like actually keeping your data, which is down with food, water, and shelter. You might think you care about replication, but you don't care about replication when both your database servers run out of memory at the same time.

> but Mongo looks terrible if you ask about losing data or leaking memory

Sorry, but this is just not true for many years now. Mongo has its warts but reading old stuff again and again feels strange and like pure hate. If your claims have some recent sources I am happy to hear them.

The list of problems goes on, but if you want really current at a local presentation a few weeks ago a user who's actually pretty positive about MongoDB reported data loss without hardware failure with what is supposedly a safe configuration. If even experienced people who like Mongo are still losing data eight years after publich release there's a problem. The claim that it's "not true for many years now" is the result of buying into PR and fanboyism.
Again the same old accusations without a single source ('local presentation', of course!).
February 2017: https://jepsen.io/analyses/mongodb-3-4-0-rc3

I would expect these bugs to keep occurring, because they're still approaching the development of the consensus algorithm incorrectly. Instead of formally proving correctness, they're doing something that seems right and then patching bugs when they are discovered.

Depending on what your goal is: Did you consider elasticsearch?

Reolication/clustering is real easy to set up.

Sharding is easy.

Scaling is easy.

I cant comment on query language vs mongo or rethink, since my experience with them is too limited.

For one thing, it isn't really a database. You can use it as one, but where elasticsearch shines is mostly write once and read often.

Correctness isn't the primary focus, so unless you are willing to lose some stuff, don't use elasticsearch as your primary database.

https://www.elastic.co/blog/found-elasticsearch-as-nosql

A couple more references regarding Elasticsearch as a primary datastore:

https://www.quora.com/Why-shouldnt-I-use-Elasticsearch-as-my...

https://stackoverflow.com/q/29841348/266535

After some years with Mongo I would not use Mongo again (we had some site downs).

It you do not need to push to the client, I'd use Postgres. It just works. For replication and sharding better wait for 10 though.

Can I use Postgres fully schemeless without even defining any collection (table) or db before doing the first write (which is possible with Mongo).
Nope. However, "time to hello world", while certainly a large factor in the success of development tools, is not always the most relevant criteria from an engineering perspective. Cf. PHP.