Hacker News new | ask | show | jobs
by solatic 3176 days ago
OP is offering some very dangerous advice.

Twenty years ago, software was hosted on fragile single-node servers with fragile, physical hard disks. Programmers would read and write files directly from and to the disk, and learn the hard way that this left their systems susceptible to corruption in case things crashed in the middle of a write. So behold! People began to use relational databases which offered ACID guarantees and were designed from the ground up to solve that problem.

Now we have a resource (spot instances) whose unreliability is a featured design constraint and OP's advice is to just mount the block storage over the network and everything will be fine?

Here's hoping OP is taking frequent snapshots of their volumes because it sure sounds like data corruption is practically a statistical guarantee if you take OP's advice without considering exactly how state is being saved on that EBS volume.

3 comments

Your response is fairly ridiculous.

A spot instance interruption isn't a system crash, it's a shutdown signal. Storing your important spot instance data on EBS is recommended by AWS. If your application can't handle a normal system shutdown without losing data, your application is at fault, not your system setup.

>exactly how state is being saved on that EBS volume

Files are written to a filesystem which is cleanly unmounted at shutdown when interruption happens.

And even if that wasn't true, network-attached storage (unlike local storage) has no semantics for communicating a "partially completed" write of a block. Your server either manages to send an iSCSI packet to the SAN with a completed checksum, or it doesn't. Which means that—for the problems that would arise from a sudden power-cut to a VM (let's say from unexpected hypervisor failure)—using a journalling filesystem on your network disks would perfectly compensate for those problems.
Common filesystems only do metadata journaling, so your file contents are not protected by this. As an exception, the ext3 and ext4 filesystems support a data journaling mode using a special flag.

Even if you had data journaling, it won't give you consistency between different files. This post used Gitlab as an example, and git will break if some files in its databse are updated, but some not. Git doesn't use fsync to ensure their update order, I don't know if Gitlab enables it or if the performance hit is reasonable.

Partially completed write of a block, sure. But partially completed write of a file?

I can imagine (cough) an application where the application is trying to write some binary blob to disk, doesn't finish before shutdown, and upon reboot, tries to load the binary blob back into memory, fails because the binary blob isn't consistent, doesn't handle the failure well, and refuses to boot.

App's fault? Sure. Does the customer care at 2 am? Nope.

Then all you're saying over and over is that in your imagination, not using a long running instance is very dangerous because rebooting exposes the fragility of your app.

Honestly, it's much safer in that circumstance to have a frequently rebooting instance because it will quickly expose your app's fragility during normal operations instead of that fragility being exposed in a disaster.

> it's much safer in that circumstance to have a frequently rebooting instance

I actually happen to agree with you in principle on this, and it's at the root of my current side project.

But sometimes you just don't have the flexibility to fix or replace the app. Ops engineering, like any other kind of engineering, is about dealing with real-world constraints and making the most of the resources you have. Most apps, on some notion of a fragility spectrum, are far closer to fragile than to antifragile, because fragile is the default, and extensive stress-testing to understand and plan for all failure modes before a production deployment isn't typically feasible. At that point, if you can't fix it, you have to work around it.

All you're doing is advocating larger, less frequent failures with people who know less. Robustness isn't just about your software or your ops setup, but also about your people and their knowledge and experience. I cannot see how less frequent, more intense failures with people who know less is preferable, and that anything else is "very dangerous advice"

You will ultimately have many fewer resources available if your strategy is to gloss over failure modes by telling inexperienced engineers to hope they won't happen. It's technical debt and the interest payments are very high.

> If your application can't handle a normal system shutdown without losing data, your application is at fault, not your system setup.

Unless something in the system shutdown fails to give the application what it needs (for instance, time) to shutdown cleanly. Which is entirely possible considering that Amazon is selling you the spot instance on the given assumption that it can give the hardware at any time to somebody who is willing to pay more. Amazon does not guarantee the time needed for a clean shutdown (only that a two-minute warning will be available via their proprietary mechanism, if you architect your application to monitor for it) for a spot instance anywhere in their documentation, and you would be ill-advised to not architect for that.

> Storing your important spot instance data on EBS is recommended by AWS

Because EBS itself is reasonably reliable. If you have configuration data (i.e. in /etc) for a legacy application that isn't managed, it's reasonable to mount that data on EBS since it's rarely written to and writes are generally human-initiated and human-monitored (with operations policy possibly mandating a snapshot even before any changes are made).

That's still very different from daemon writes to /var. Take for instance, the PostgreSQL documentation which warns that snapshots must include WAL logs in order for the snapshot to be recoverable, and that it is quite difficult to restore from a snapshot if you stored your WAL logs on a different mount: https://www.postgresql.org/docs/10/static/backup-file.html

You need to understand precisely how your application is treating your storage and act accordingly. Thinking that all applications interact with storage the same way is dangerous and liable to cause data corruption and loss. That's all.

Spot instances are shut down cleanly via the usual stop semantics (which includes all the shutdown handlers provided your OS supports them). Assuming your database software supports clean shutdowns via SIGTERM, everything should be fine.
> Assuming your database software

You're assuming that people are saving their state in databases to begin with. If you're saving state to a database in production, typically you're communicating with that database over a network connection, and not running the database on the same machine as your application. Containerizing databases is a whole separate issue.

OP's specific example is saving /var/opt/gitlab to an EBS volume and expecting to be able to move it from one spot instance to another without corruption. That strikes me as insane.

What is so insane about this? It's no different than plugging in a USB drive, modifying some data on it, then disconnecting. Except in this case, the mount/unmount happens outside of the application's lifecycle so it can initialize and shutdown cleanly without worry.
Why? The gitlab init script to stop it is being run. It's a clean shutdown.
What happens if something causes it to hang? Presumably EC2 will time it out at some point.
And if GitLab (or whichever other application) is hanging and the stop script fails to cleanly shut down the application?

Shit happens at scale, it's precisely why ACID guarantees are important. Specifically in GitLab's case, because configuration is stored under /etc/gitlab, relying on EBS snapshots as a safeguard against corruption only works if the snapshot is taken of the entire FS, not just /var/opt/gitlab. If your machine is properly provisioned from an AMI or at least from some kind of configuration management, and you have some kind of reasonably-enforced policy which only permits changes through those management systems, then maybe you can get away with only taking a snapshot of /var/opt/gitlab, but now we're getting into the territory of "I understand how my data is being stored to the EBS volume (in this case, according to documented GitLab instructions) and I am acting accordingly". Then, if the /var/opt/gitlab snapshot ends up being corrupted, the odds of getting an uncorrupted snapshot increase with the more snapshots that you try, and this is probably good-enough in this specific instance because if you needed a better guarantee than that, you'd have a proper HA setup.

This pattern is a lot safer if you use ZFS. Spot instances don't just disappear though, you get notification and have a chance to perform shutdown actions, except in the case of hardware failure - which is the same with non-spot instances.
- EBS, being block storage, doesn't recognize the filesystem format on top of it, and therefore doesn't recognize if you formatted the block storage as ZFS and therefore will not use ZFS snapshots when using Amazon's native EBS snapshotting. If you wish to use ZFS snapshots, you have to build that on top of what Amazon gives you, along with all the other aspects of ZFS storage, i.e. building a ZFS storage pool from separate EBS volumes. I mean, it would be nice if Amazon had a hosted ZFS solution, but so far, doesn't seem like it.

- Yes, you get a notification, but it's a proprietary notification scheme that your application must be designed to poll for. Why can't Amazon use standard signals like SIGPWR to indicate imminent shutdown?

- Just because it isn't smart for non-spot instances doesn't suddenly make it smart for spot instances ;)

SIGPWR is anything but standard, and it's unclear how AWS would even send that signal to your processes without adding an agent to the instance.

Currently they initiate an ACPI shutdown event at the termination time. It's hard to initiate a shutdown in a more standardized manner. An instance shut down via this signal will generally see the init process begin gracefully stopping services, eventually halting on it's own. Typically your init process will get increasingly aggressive with kill signals, as defined by your service definitions, eventually getting to SIGKILL. If your init process fails to get the vcpu halted, after a (undocumented?) period AWS will halt the cpu(s) for you. This is about as graceful a shutdown as you're going to get with 'standard' interfaces.

Termination Notifications go out of their way to give you an extra heads up, in case your application is unlikely to gracefully handle being shut down by the init system. Think DB hosts with a craploads of dirty blocks that take a few minutes to sync to disk at shutdown.