Hacker News new | ask | show | jobs
by taotau 1864 days ago
What sort of systems are you building ? I’m guessing some variation of CMS, shopping cart or simple crud backoffice reporting systems.

I’ve spent 14 years working for a small agency building these sorts of systems. I kept up with all the cool new stuff that’s come out and played with it but all we ever needed was 2 load balanced app servers with a single instance Postgres database and we ran some pretty chunky systems.

I’m now looking for a new position and this attitude has come up to bite me. I’ve always been dismissive of the new dangled complexity but turns out there are systems out there that churn through so much data that mongodb makes sense, and large teams of devs where splitting systems into separate services and writing reusable components is required which is where frameworks and inter process communication tools come in.

Most things exist because there is a need for them.

3 comments

The last thing I built was a CMS structurally (a new non-commercial knowledge service), took about ~18 months to build. I plan to work on it for the rest of this decade approximately.

Currently working on a couple of trading systems (for physical objects, not financial). That's in the ecommerce sphere technically, although they're not monetary transaction based. The first one will hopefully launch in June. I might Show HN it.

Both work perfectly with the setup I described. I build in my wheelhouse though. I can build services like Stack Exchange, Quora, Wikipedia or Reddit (or most things in standard ecommerce, ecommerce platforms). I know what my stack is good at, and what it's not appropriate for. And at this point in my life I know well what I'm good at and what I enjoy, so I look for opportunities there.

> so much data that mongodb makes sense

I don't want to dismiss this without any arguments but please do the basic research before you choose the db for your next project.

Parent is probably referring to how MongoDB has consistency issues and isn't recommended by third party testers like Jepsen.

https://www.infoq.com/news/2020/05/Jepsen-MongoDB-4-2-6/

MongoDB IMHO also has resource allocation issues, because for a simple two "node" distribution setup you have ~4 times the demand for resources than a "regular" clustered service like Postgre or Oracle. Added to that the actual promised efficiency relies heavy on the data model and has no real easy way to correct once deployed.
How do you get four times the demand for resources from two nodes?

The "promised efficiency" of any system is a complex question. The dynamic schema approach of MongoDB makes it trivial to update schema design on the fly so I am confused by "no real easy way to correct once deployed".

BTW : The recommended production setup is three nodes (one primary, two secondaries). This is what you get by default in MongoDB Atlas (you can't go smaller).

(I work for MongoDB)

I was talking about a two "node" Mongo setup which would mirror the capabilities of a classic two node setup of Oracle RAC for example. Since in Mongo this is not a clear 1:1 match due to a different model of operation, in order to get the same performance/continuity as RAC here is what I mean.

If I want to gain a relative 100% increase in concurrent read/write access and transparent session failover to my application I have to procure two servers(with roughly the same performance lets put it at rawPerf=1.0 for simplicity) and a SAN/NAS with SCSI HBA and that would be it. All the necessary components will be running on those two servers and the application will have access to almost all capabilities of a single instance with a roughly twice the performance and fault tolerance. To do that in Mongo, I would have to procure 2 replica sets with 3 instances each(bare minimum to get the fault tolerance and not use arbiters which according to Mongo University is a bad idea) for the actual data and another RS for the configuration meta-data, which is another 3 instances. I will also need instances where I can run my "mongos" to route/consolidate my requests and to have fault tolerance I will need at least two of those. Now to do a break down of the different instances. From the two replica sets for data, I can use only 2 of the instances for writes and the remaining four only for consistent reads and that is only with the appropriate write concern. And while I have two instances for writes I am not guaranteed to get a consistent increase in write operations unless the data is at least somewhat evenly distributed. I will need six rawPerf 0.5 servers(assuming 0.5 rawPerf is capable to handle roughly half the data), that is a total of 3.0 so far. The configuration RS while not as demanding as the actual data RS will still require resources so I count it as roughly between 0.5-1.0 total for all three instances. And finally I have the "mongos" instances which although will not need disk space will need some CPU and the same amount of RAM as a regular instance in case the queries and not optimized, so I will have to give it 0.5 per instance. And the total amount of resources comes to 3.0+0.5+1.0=4.5. On the other hand with the Oracle RAC I would've needed 2.0.

As for the schema flexibility, from the Mongo documentation I take it that once the shard key is created it can't be changed arbitrarily. It can only be further fine-grained by adding more keys to it, however this is available from version 4.4. In order to change it cardinally, I would have to recreate the collection.

I'm sure you are well aware of all this, however I'm just stating my experience with MongoDB so far.

AH I see. When we talk about nodes in MongoDB they are the elements of a replica set. That is what confused me. Yes, you are right, to get the equivalent performance of a two node database server you would have to run a sharded cluster with two shards. That is definitely more nodes than the two node design you have outlined for the RAC like design.

I am not familiar with Oracle RAC architecture but casual mention of a SAN fails to recognise the cost and complexity of even the smallest SAN. On the other hand MongoDB can run on dumb disks without a SAN or RAID required (though RAID can help with performance).

With you two node system loss of a node results in a 50% loss in performance whereas with MongoDB loss of a node will result in a secondary taking over the primary role with no loss in performance. You get what you pay for.

Setting up such clusters used to be a PITA until the advent of MongoDB Atlas. Now you can set them up with a few mouse clicks.

As regards shard keys. Yes, they are immutable, but the documents they index are not. So dynamic schema properties are preserved in a sharded cluster.

Taotau my aesthetic is similar to yours, but it's true that we are a minority. Devs are, as a group, perceived as technology maximalists that love new toys, and the truth of the perception is reflected in the software landscape. Libraries like Angular, RxJs, and Webpack are all packed to the gills with features! It's also reflected by all the "slightly better" options we get in our tooling (yarn v npm) and (scss v css) and (ts v js) and even (ts v es)lint. But without exception, all of those incremental improvements still require that you have total mastery of the browser's native runtime, html5 + css3 + ES6 + dom + async, and all the protocols it enables, HTTP, TCP/IP, DNS, and all the infrastructure that's grown around it all (CDNs and load balancers and hypervisors).

Devops is in a similar state! There is growing consensus that "Container orchestration" is something like the right problem to solve, and people seem to like Kubernetes, but tools like Puppet, Ansible, Salt are still in heavy usage, and are still very useful, but they solve a different problem. Even within k8s there is debate about the proper container runtime (Docker has fallen out of favor for containerd). But all of this assumes total mastery of linux, ssh, bash, systemd, vim, sudo, mount, ps, and all the distribution variations.

Java is in a similar state! Widely considered "boring technology", Java (lang) 8 introduced streams and lamdas, which is a very different way to code (and hence, to think), and emphasizes functional style over object-oriented programming. Then you get to pick which Java you want (Oracle, RedHat, ?), and how you want it installed (installer, brew, unzip, docker), and now your build tool (Maven, gradle, play, ?) Then if you're making a "typical" webapp you have to choose your VCS (git, mercurial, svn, cvs), code hosting provider (github, gitlab), your starting point - Spring, Spring Boot, Dropwizard - which assumes you want REST, when in fact you may want to try GraphQL, which may lead you to nodejs, and it's complexity.

Picking system dependencies is similarly complicated. Components we can choose from - message queues (rabbitmq, kafka, activemq, ?), caches, proxies (nginx, httpd), and here we have to balance many considerations, such as "is there a widely trusted image I can configure and use as is?", and your options, and the details used to deploy them, will depend on every other decision you've made. In fact, all of these choices are made in away that affects some or all of the others, in a way that is complex and difficult to articulate. This is certainly why certain combinations get acronyms (LAMP, Jamstack, etc[0]), and why its very hard to have an intuitive picture.

[0] https://en.wikipedia.org/wiki/Solution_stack

> Devs are, as a group, perceived as technology maximalists that love new toys,

Make that web devs, if at all. In many other places where software is written, it's closer to the opposite: Managers tend to be minimalists, and the more "adventurous" developers lobby for adopting newer libraries, or use a new language for some side-project.