| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ajmurmann 1430 days ago
	> "If a monolith can live on multiple servers, what do you call an application that can only ever live on a single server, with a single instance being launched at the same time?" That exists? Are there examples of this, especially once where there is a good reason for this? I cannot even begin to list all the awful issues with this in my head.

1 comments

KronisLV 1430 days ago

> That exists? Are there examples of this, especially once where there is a good reason for this? I cannot even begin to list all the awful issues with this in my head.

Most certainly. I'd suggest that many systems out there that ever only needed to run on a single server are structured like this. Even though you could technically take plenty of these systems and launch two parallel instances, you'd get problems because they haven't adopted the "shared nothing" approach, or even just basic statelessness principles.

We tend to forget ourselves with all of our modern and scalable container developments, but there are untold amounts of PHP code out there that stores files and other uploaded data on the very same server, in any number of folders. Of course, you can technically set up a clustered file system, or at least a network based one, unless you are running in a shared hosting environment, in which case you are out of luck.

Oh, and speaking of shared hosting, in theory you should be able to get rid of environments that use cPanel and instead switch to containers, right? Well, no, because workflows are built around it and dozens of sites might be run on the same account with any given shared hosting provider.

You'll be lucky to even find such an environment that has an up to date version of PHP installed and running and resource contention issues will present themselves sooner or later: "Oh hey, this one slow SQL query in this site brings down this other dozen sites. Could you have a look at it?"

I actually helped an acquaintance with that exact problem, I dread myself for agreeing to help because it wasn't a good experience.

Looking at the enterprise space, I've also seen systems out there that store state (e.g. information about business rules) in the actual application memory liberally, as well as things like user session information, because someone didn't know how or couldn't be bothered to set up Redis.

So there an app restart would mean that everyone is logged out. Not only that, but if you have a system which allows users to make some sorts of requests, with business rules about what order they can be accepted in, that means that you can store the output of these states in the DB, but during the processing you have an in-memory queue, which means that you couldn't feasibly have multiple instances running in parallel, because then you'd have a split brain problem. It's like those people had never heard of RabbitMQ while designing it.

Apart from that, there are also issues with scheduled processes. If you've never heard of feature flags or don't see a good reason to use them, you'll run into the situation where you'll have your main application instance executing scheduled tasks in parallel to serving user requests. Worse yet if it's coupled tightly and the application will do "callbacks" for reacting to certain changes, instead of passing the message through the DB or something. Oh and in regards to performance, you better hope that the reporting process you wrote doesn't cause the service's GC to thrash to the point where everything slows down.

Oh, and in addition to that, there are hybrid rendering technologies like PrimeFaces/JSF out there, which store the user's UI state on the server (in memory), whilst sending diffs back and forth, as well as making the client execute JavaScript in the browser for additional interactivity. Think along the lines of GWT, but even more complicated and way worse. A while back some people talked about how the productivity can actually be pretty nice, but what I saw was 100% the opposite, but more importantly there's also no viable way to (easily) distribute this UI state across multiple instances, at least with the way the eldritch monolith is written. I've also seen Vaadin applications with the same problem.

Another factor that can cause situations like this to eventually develop is having a tightly coupled codebase, where you cannot reasonably extract a piece of code into a separate deployment, because it has 20+ dependencies on other services in the app and is called in about 40+ places (not even kidding). While you could try, before you know it you would be sending 20 MB of JSON for simple data fetching calls between applications (again, not kidding - once actually saw close to 100 MB of network traffic between back end services and DB calls for a page to load).

Those are just some of the issues. My suggestion would be to never build systems like that no matter how "simple" they seem and instead just stop being lazy and use Redis, RabbitMQ, or even just PostgreSQL/MySQL/MariaDB tables for ad-hoc queues, anything is better than writing such messes. And if you are ever asked to help someone with anything that starts looking like the above, tell them that your schedule is sadly full or at least very carefully consider your options.

link

xorcist 1430 days ago

> because they haven't adopted the "shared nothing" approach,

In practice, many web applications are stateful. The load balancer would see to it that clients keep talking to the same frontend. For larger applications it is important for cache locality.

> untold amounts of PHP code out there that stores files and other uploaded data

This is quite normal when you have some type of blob, and normally what networked file systems are user for.

link

KronisLV 1430 days ago

> In practice, many web applications are stateful. The load balancer would see to it that clients keep talking to the same frontend. For larger applications it is important for cache locality.

In regards to front end resources, it shouldn't matter which instance you're talking to, if all web servers are serving copies of the same bundle, given that the resource hashes would match, outside of A/B testing scenarios. It's also nice to explore stateless APIs where possible, and not have to worry about sticky sessions.

In many dynamically scalable setups if you tried talking to API instance #27, you might discover that it is no longer present because the group of instances has been scaled down due to decreased load. Alternatively, you could discover that the instance that you were talking to has crashed and now has been replaced by another one.

Hence, having something like Redis for caching data, or even a cluster of such services becomes pretty important! Of course, there are ways to do this differently, such as taking advantage of CDN capabilities, but for the most part sticky sessions are a dated approach in quite a few cases. It's easier for everyone not to care about ensuring such persistence.

An excellent exception for this: geographically distributed systems where even if you don't care about that exact instance, you still want stuff in this data center to be reached, instead stuff half way across the world.

> This is quite normal when you have some type of blob, and normally what networked file systems are user for.

Nowadays, I'd argue that S3 (or compatibles like MinIO or Zenko) is one of the very few ways to do this properly, or perhaps GridFS in MongoDB - an abstraction on top of the file system, that handles storing and accessing data as necessary. Then, using a distributed or networked file system, or block/object storage (depending on the setup) is a good idea.

However, in general cases, you should never use the file system directly for the storage of your blobs, regardless of whether those are stored locally or in a networked file system, as that is just asking for trouble. Things like maximum files per folder, inode limits, maximum folder nesting/file name length limits, maximum file sizes, writing bad code that allows browsing other directories than the intended ones, the risk of files that might be executed in the case of bad code/configuration, case sensitivity based on the file system, encoding issues, special characters in filenames or directories, need to escape certain characters as well, reserved names in certain file systems and frankly too many issues to list here.

So yes, it is "normal" but that doesn't make it okay, though one also has to understand that often in a shared hosting environment there aren't good options on offer, versus just spinning up a MinIO container and using the S3 library in your app.

link