So if you were backing up your data to Tarsnap, then you'd be up and running as quickly as you could launch a new instance and redownload everything. And $500 credit is enough to power a micro droplet for 100 months, or a small droplet for 50 months. DO handled this well.
So this is a technical problem I am having right now that's preventing me from backing up a Postgres database completely (hope someone here can help).
I have a master Postgres database that is receiving a TON of transactions per second (I'm talking about thousand concurrent transactions). We tried running pg_dump on this database, but the DB is just too huge, and it took more than 4 days to completely dump out everything. Not only that but it impacted performance to the point where backing it up was just not feasible.
No problem.. just create a slave-DB and run pg_dump on that, right? We did just that, but the problem is that you can't run long running queries on a hot standby (queries that take more than a minute).
What would you do in my scenario? With the hot standby, I technically am backing up my data, but I would have 100% piece of mind if I could daily backups in case someone accidentally ran a "DROP DATABASE X", which would also delete the hot standby/slave db as well.
Alternately, you can issue pg_start_backup('label'), backup the filesystem, then issue pg_stop_backup() and keep all the WAL logs from that time. That'll get you a base backup similar to the slave.
What I'm doing is this:
I've got a primary/hot spare pair, and a tertiary db on lesser equipment that's my second copy for cases where I have one of the main machines down or I have to rebuild the secondary from the primary.
The tertiary db ships logs to s3, after gpging them. Every $timeframe, I take a base backup and throw it up as well. I keep a couple, and delete the older ones. Every few months, I test a restore on ec2. There's a balance between the WAL logs that you need to keep, the time to restore, and the frequency of base backups.
[edit - parameter names. Further edit - strategy.]
Inside postgres.conf for the slave I have the following:
# These settings are ignored on a master server.
hot_standby = on # "on" allows queries during recovery
# (change requires restart)
max_standby_archive_delay = 900s # max delay before canceling queries
# when reading WAL from archive;
# -1 allows indefinite delay
max_standby_streaming_delay = 900s # max delay before canceling queries
# when reading streaming WAL;
# -1 allows indefinite delay
#wal_receiver_status_interval = 10s # send replies at least this often
# 0 disables
#hot_standby_feedback = off # send info from standby to prevent
# query conflicts
So I set it to 15 minutes for this specific backup server which I am okay with. I already have another server with much shorter time delays.
It's sacrificing the expected latency of replication.
Incidentally, if you're on 9.3 and your HW can handle it, take a look at parallelizing the pg_dump. If you've got a relatively fast disk subsystem and many cores, you can get a speedup. I've found it tends to make the dumps O(biggest table) instead of O(sum of all tables).
(It's native on 9.3, I've hacked up some scripts that do it for 9.0, but they don't get a consistent snapshot, so I do it during scheduled downtime. OTOH, the dump/restore is ~6x faster OMM/OMD, so the downtime is that much shorter)
Was the database designed using transactions to achieve consistency? If so, then you can just instruct Tarsnap to back up the folder containing your database every day, and you're done.
If the DB uses transactions for consistency, you can copy it at any time without any problems.
... what? The point of consistency is that if the power to your server is cut, then you can reboot and pick up precisely where you left off. That means the database on disk must have consistency. Meaning you should be able to copy it at any point in time without any problems. If you can't, then that's not consistency, and if postgres really works that way, then it's failing one of the basic tenants of being a database. http://en.wikipedia.org/wiki/ACID
Any database that purports to have consistency must be able to withstand cutting the power to the server at any time. And if it can do that, then it must be true that you can copy the database folder at any time, too, without any special commands. (pg_start_backup is not issued before every power loss, so why would it need to be issued before a copy?)
On the other hand, if postgres doesn't support consistency, then that'd be a major reason not to use it.
EDIT: I'd run the server in a VM and backup VM snapshots. VMware makes this painless (and the snapshotting process is designed to have minimal impact disk I/O performance for precisely the scenario the OP described). VirtualBox probably has something similar. These replies seem crazily overcomplicated in comparison.
Unless tarsnap does something like LVM snapshotting, then it's not going to get a consistent snapshot. You can't just copy the directory of an active server.
Leaving aside the management issues of huge vm images and the less than ideal io performance, the ACID guarantees of pg rely on the underlying hardware obeying some specific restrictions, including real fsync and not lying about when things are on permanent storage. Getting the drives and raid controllers to obey that has historically been a difficult, ongoing job that has to be redone with each new generation of hardware. SSDs have been particularly interesting with that, the actual flush to disk can be quite delayed from the logical write. Some have supercaps, some don't. Those that don't are vulnerable to power losses while the data is still in the drive's ram awaiting a block erase and write. The IDE drivers used to flat out lie. Enterprise SAS drives often come with the write caching turned on (since it looks better in benchmarks) even though they're often times used behind a battery backed raid controller.
Adding a VM layer to that just to get snapshots seems overly complicated and prone to issues.
If you're taking an instantaneous snapshot of the system then yes. A standard copy/rsync/etc. isn't going to give you that. If the copy takes a long time at what point do you grab the pg_xlog directory? and are all the files there that you need/ed?
ACID doesn't apply since you can't copy a large file in an instant. The copy takes time, in which time the files on disk can change. This isn't the same thing as the server losing power.
VM snapshots, zfs snapshots, etc are the way to go.
There's something that works and there's the right way to do it. It's better to do things the right way if you want to make sure everything is in a good state when you bring it back and there aren't edge cases you missed ... What if someone forgot to use a transaction?
The start/stop backup has to be issued on the master. It doesn't look like the standby gets the backup label (at least on 9.0, may have changed since). So you'd have to be reading from the master's data directory.
Alternately, you could stop the secondary and pull from there. But that interrupts the replication, and then the secondary would have to catch up, which might be hard depending on your level of usage.
What filesystem are you running on? Can you snapshot it outside of the postgres environment? The database may be mid-transaction at that point, but it's still better if it does log replay at startup, than losing all the data.
Also if your filesystem snapshots can be exposed as files / block devs, you can rsync them to another host lowering the amount of transferred data (keep the previous copy so rsync will only copy the blocks that differ).
Just a thought... If your storage layer has support for taking a consistent snapshot of your file system then you might be able to use this to get a backup.
You would get a copy of your database that you would need to run log-replay recovery on but after that it should be all good.
pg_dump is a logical backup, that is, as you've seen, it queries the all of the data in your database in writes to a file in the form of queries that will re-create all of your data in a new database. This great and very flexible, but as you've seen has some limitations.
You probably want to look into physical backups, where you basically copy the actual files that postgres is actually using to store your databases on disk (although it's not quite that simple, so do some googling on it). This has the nice advantage of not requiring you to run queries against your database to back it up. It also gives you a consistent point-in-time backup of your database.
The abrasive headline is kind of unfortunate, as the actual moral of the story given at the end is exactly the right takeaway: Never assume your hardware is infallible, so always have backups that you know you can use when your server experiences a wildly improbable catastrophe.
Also, very impressed by Digital Ocean's response here. Given their reputation as a budget host, they really do put a lot of effort into service.
Or an extremely probable one like a hard disk failure. They only last a few years; most data centers see an annual replacement rate in the 2-13% range. The failure rate is a known quantity, and their limited 1-3 year warranties that reflect that expectation.
There isn't a host I've used more than a few years where I haven't seen hard drives (and power supplies) fail. I don't know if my experience is typical, but hardware RAID controllers seem to go bad on me not-infrequently too, losing the whole array at once. They don't pay you when it happens, they just replace it. DO was extremely generous here.
Was going to say the same thing, Dual drive failure on a RAID5 system with five 2TB drives is 1 in 12. With 3TB drives that goes up to 1 in 7.
The underlying issue is that the uncorrectable read error rate is 1 in 10^15 bits, this is just physics (thermal noise, read/write signal loss, etc) But with 8b/10b encoding that is only 90TB worth of bits. Rebuilding a RAID group of 5 with four 2TB "good" drives (8TB of data to be read) you will see a failure in one of the other 4 drives 1 in 11.25 times. (90/8). With 3TB drives 1 in 7.25 times. Using simple mirroring you won't be able to re-silver a mirror 1 in 1:45 or slightly more than 2% of the time for 2TB drives.
Dual parity, or triple mirrors (x3) are now the minimum bars for making storage reliable.
It's because some startups have developers that open w3schools, start typing examples, and somehow ship a quasi-working proof-of-concept that goes into production.
There's a bit of "if it ain't broke don't fix it" here, but a whole lot of "get with the program" still required.
Well as a professional Systems Administrator, it pisses me off more than it probably should. It's like you want to know why I'm worth what I'm asking because when your shit falls down and goes boom, I'll get you back up and operational in minutes or an hour.
Because it's my fucking job to help you manage your IT risks. Azure, Heroku, AWS aren't replacements for Systems Administration, they're just tools in my arsenal. I don't understand the mentality it takes to go into business (beta or not) without having SOME understanding of your risk. The fact that DO paid you a not insignificant amount due to downtime, means you're damn lucky.
Do you know of anyone who didn't get deadly serious about backups before they had a sour taste of data loss?
Me, I was just lucky my first really interesting experience was on a big UNIX(TM) Version 6 system, with a couple of user accessible DECTapes. Buying a tape was cheap enough, and the whole thing was neat ... and then I learned the -rf flags to rm. And had any critical data I lost on that DECTape.
Today I do nightly backups of my home systems to LTO-4 tapes (as well as offsite of the most critical to rsync.net a time zone away).
Yes, of course. A full fledged sysadmin is expensive, and startups will typically make several costly mistakes before going to that expense.
This is not surprising, and is not even regrettable. If the business can't support the overhead of someone who doesn't directly bring in revenue, then it can't. And if there's a large investment that makes good infrastructure engineering possible, first-time entrepreneurs might not realize that they need that function.
The key to long term success is in realizing what you will need before it's too late to get it.
It's great you had backups, but why a write-up. Is it an attempt to smear DO's otherwise good name? It's an un-managed VPS so it's your responsibility to keep backups of your box, not theirs. And hardware fails all the time, so you can expect this to happen anywhere.
> And if you just launched and have a single instance running, let your alpha users know that there will probably be some downtime.
That's true. But there's no reason for extended downtime even if that instance goes down. Make sure your whole setup is described in chef/puppet/salt/ansible/cf/whatever and even a rebuild from scratch takes only minutes then. There's really little reason to skip that these days.
DO is affordable enough that the minimum you should run are 2 droplets. Having said that, I'm actually fairly impressed with the 500 credit and now you have no excuses to run 2 vms. Consider it a lesson learnt.
DigitalOcean's pricing page indicates that "All cloud hosting plans include automated backups". (https://www.digitalocean.com/pricing) From the email you received, it sounds like this is clearly not the case. I wonder what other claims DigitalOcean is making that are not true.
There is an automated backup system that you have to enable for a droplet, that creates a snapshot every few days. It's a clear part of a droplet's control panel. They began charging for it in July 2013. The price is 20% of the droplet's monthly cost. Sounds like they need to update their pricing page.
This is pertaining to a droplet feature though, and not some low-level backup system. Meaning, it's not as if they're lying about the infrastructure below what a normal customer can see. They just have an erroneous pricing page.
This might sound a bit glib, but raid 5 shouldn't really be used in modern storage.
If you ignore the performance issues (which can vary by device) its just not safe. Depending on the size of drive can take anywhere up 30hours+ to rebuild.
bear in mind that you tend to use disks that are all the same batch, it leaves you in the danger zone for far too long.
Your options are:
somesort of clever RAID (ZFS type thing)
Another type of clever RAID (Like the LSI chunk thingy in the DCS37000)
RAID 10
For SSDs, where the time-to-read/write-full-capacity is typically much less than HDDs (both due to higher speed & lower capacity), it can be less of a poor decision. SSDs also have somewhat more advanced machinery for data integrity checking and slightly friendlier failure modes (e.g., the sectors "wear out" over time, but the firmware tends to warn you as that starts to happen, and you're not going to hit a sudden mechanical failure).
Was this really a dual drive failure, or was this the rather common single drive failure plus undetected errors on a backup drive, that show up when trying to rebuild?
Because that happens a lot, and it's very important to do a full read of every drive in the array at least weekly! You have two options for doing that:
If you are using linux md raid then run the "check" command, which automatically does the test using background I/O (but does still impact things). On debian, and perhaps other distros too the mdadm command will do it every month by default. Make sure to set a minimum speed or it might never finish if you have a busy system.
You can also use the built in SMART on the disk to do a long self test. This also uses background I/O and I think it has a bit less impact on existing operations. (But you have to have some idle time on the disk or it will never finish.) If you install smartmontools you can set smartd to do this test for you every week, and keep an eye on the results.
I personally do both, plus a short self test of the disk every night.
I truly believe that we did the best we could in this instance. Drive failures are always always unfortunate, even with backups, downtime exists.
That being said, we're always genuinely looking to improve, and I'd welcome your feedback on how you feel we did and how you feel we could do better. Please do reach out to me personally john@do! Thanks. :)
With that being said, these days it's a good idea to use a deployment tool or configuration management system like puppet/salt/ansible/chef/etc, especially in a virtualized environment. This will help with scalability as well as situations such as these.
This is the reason why I moved all data away from my server instances. My images are hosted by cloudinary(with s3 bucket backup) and my databases are Amazon RDS instances.
I don't care if a server goes down, I can launch a new one in a matter of minutes (with ansible) without any data loss.
Which of those things you named is protecting you from losing your database? I paid the uber-high fees for RDS with Multi-AZ failover and... well... it failed, then didn't fail over to another AZ. The instance ended up down for hours before they recovered it. That's when I jumped ship from AWS, wrote off the reserved instance payments, moved the database to some rented servers at SoftLayer, and handle nightly off-site backups myself. Not only do I have working backups and failover, but 4-8x the capacity per dollar.
The author is sweet, his conclusion was "always backup your data" if it was me I would probably say "I'm moving away, will never trust them again on my data" ..
Is there a provider that credibly offers high availability Linux servers? Disks fail, capacitors fail, power fails, network equipment fails (a lot). I'm sure it's possible to build an ultra-reliable server that mitigates all that but I doubt it would be worth the money.
At 5$/month, I think it's not too much an investment to have some basic redundancy if you care about your data. Anyways, if your data matters to you, do backups.
The $500 credit from DO is quite reassuring. Usually if the HD fails and your data is lost, your out of luck. I hear the "horror" stories of some hosts reusing consumer Hard Drivers between servers so learned, Your data is your responsibility. I'm glad the OP had backups but these failures happen, thankfully DO had the business sense to compensate them.
Seems good advertising for DO, as any knowledgable system admin knows Drives fail. DO could have not done anything.
Nice move from DO to give everyone $500 credit. As I remember, they don't guarantee data safety (you still need backups even if they did). Double disk failure is a rare thing, but it happens.
Backup the data and configuration information to an object store (AWS S3), use configuration management tools so you can programmatically provision a new server (dedicated or virtual, doesn't matter) in the event of failure. Provisioning should include functionality to deploy your application, and to restore your data to whatever data storage application (SQL, NoSQL, etc) you're using.
If you have questions, more than happy you provide free advice.
http://www.tarsnap.com
EDIT: s/years/months/g. Thanks.