Hacker News new | ask | show | jobs
by KronisLV 640 days ago
> Curious to know your thoughts on how you manage your infrastructure.

What I quite like about your repo:

  - there is a separate API and background job instance
  - there is a separate web image, to not always couple front end deployments to back end
  - there are specialized data stores like Redis (or maybe RabbitMQ or MinIO in a different type of project)
  - Dozzle seems nice https://dozzle.dev/ (I use Portainer mostly, but seems useful)
What I think works quite nicely in general:

  - starting out with a monolithic back end but making it modular with feature flags (e.g. FEATURE_REPORTS, FEATURE_EMAILS, FEATURE_API), so that you can deploy vastly different types of workloads in separate containers BUT not duplicate your data model and don't need to extract shared code libraries (yet) and if you ever need to split the codebase into multiple separate ones, then it won't be *too* hard to do that
  - having a clear API (RESTful or otherwise) as the contract between a separate back end and front end deployment, so that even if your SPA technology gets deprecated (AngularJS, anyone?) then you can migrate to something, unlike when doing SSR and everything being coupled
  - the same applies to NOT having the same container build process have both the front end and back end build (I've seen a Java project install a specific Node version through Maven and then the build dragging on cause Maven ends up processing thousands of files as a part of the build)
  - using the right tool for the job: many might create full text search, key-value storage, message queues, JSON document storage, even blob storage all with PostgreSQL and that might be okay; others will go for separate instances of ElasticSearch, Redis, RabbitMQ, something S3 compatible and so on, probably a tradeoff between using well known libraries and tools vs building everything yourself against a single DB instance
  - in my experience, many projects out there are served perfectly fine by a single server so Docker Compose feels like the logical tool to start out with, if multiple instances indeed become necessary, there is always Docker Swarm (yes, still works, very simple), Hashicorp Nomad or K3s or one of the other more manageable Kubernetes distros
  - self-hosted (or self-hostable) software in general is pretty cool and gives you a bunch of freedom, though using managed cloud services will also be pleasant for many, more expensive upfront but less so in regards to your own time spent managing the stack; the former also lends itself nicely to being able to launch a local dev environment with the full stack, which feels like a superpower (being able to really test out breaking migrations, look at what happens with the whole stack etc.)
  - having some APM and tracing is nice, something like Apache Skywalking was pretty simple to setup, though there are more advanced options out there (e.g. cloud version of Sentry, because good luck running that locally)
  - having some uptime monitoring is also very nice, something like Uptime Kuma is just very pleasant to use
  - heck, if you really wanted to, you could even self-host a mail server: https://github.com/docker-mailserver/docker-mailserver (though that can be viewed as a hobbyist thing), or have MailCatcher / Inbucket or something for development locally
1 comments

I'm a big fan of the modular monolith pattern, but haven't used feature flags for the purpose you're describing. Do you use any specific tools or frameworks for that? I'd also imagine there would be calls between features from within the same codebase, do those become network calls? And how does this interact with your Docker Compose/single server recommendation?
> Do you use any specific tools or frameworks for that?

You don't need to, you can just enable/disable certain features during app startup, based on what's in the environment variables/configuration, though many frameworks have built in functionality for something like that, for example: https://www.baeldung.com/spring-conditional-annotations

If I wanted to allow toggling access to the API, then I'd have an environment variable like FEATURE_API and during startup would check for it and, if not set with a value of "true", then just not call the code that initializes the corresponding functionality.

It's really nice when frameworks/libraries make this obvious, like https://www.dropwizard.io/en/stable/getting-started.html#reg... but it might get harder with some of the "convention over configuration" based ones, where you have to fight against the defaults.

> I'd also imagine there would be calls between features from within the same codebase, do those become network calls?

It depends on how you architect things!

There's nothing preventing you from using the service layer pattern for grouping logic, and accessing multiple services in each of your features as needed, and poking the different bits of your data model (assuming it's all the same DB).

If you are at the point where you need more than the same shared instance of a DB, then you'd probably need a message queue of some sort in the middle, RabbitMQ is really nice in that regard. Though at that point you're probably leaning more in the direction of things like eventual consistency and giving up using foreign keys as well.

> And how does this interact with your Docker Compose/single server recommendation?

Pretty nicely, in my experience!

When developing things locally, you can enable all of the needed FEATURE_* flags on your laptop, then it's more like a true monolith then.

Want to deploy it all on a single server when the scale is not too big? Do the same with Docker Compose, or maybe have separate containers on the same node, each with one of the features on, so the logs are more clean and the resource usage per feature is more obvious, and the impact of one feature misbehaving is more limited.

The scale is getting bigger? Docker Swarm will let you scale out horizontally (or Nomad/K8s, maybe with K3s) and you can just move some of those containers to separate nodes, or have multiple ones running in parallel, assuming the workload is parallelizable (serving user API requests, vs some centralized sequential process).

At some point you'll also need to consider splitting things further in your database layer, but that's most likely way down the road, like: https://about.gitlab.com/blog/2022/06/02/splitting-database-...