| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Rezo 3305 days ago

Here's some simple practical tips you can use to prevent this and other Oh Shit Moments(tm):

- Unless you have full time DBAs, do use a managed db like RDS, so you don't have to worry about whether you've setup the backups correctly. Saving a few bucks here is incredibly shortsighted, your database is probably the most valuable asset you have. RDS allows point-in-time restore of your DB instance to any second during your retention period, up to the last five minutes. That will make you sleep better at night.

- Separate your prod and dev AWS accounts entirely. It doesn't cost you anything (in fact, you get 2x the AWS free tier benefit, score!), and it's also a big help in monitoring your cloud spend later on. Everyone, including the junior dev, should have full access to the dev environment. Fewer people should have prod access (everything devs may need for day-to-day work like logs should be streamed to some other accessible system, like Splunk or Loggly). Assuming a prod context should always require an additional step for those with access, and the separate AWS account provides that bit of friction.

- The prod RDS security group should only allow traffic from white listed security groups also in the prod environment. For those really requiring a connection to the prod DB, it is therefore always a two-step process: local -> prod host -> prod db. But carefully consider why are you even doing this in the first place? If you find yourself doing this often, perhaps you need more internal tooling (like an admin interface, again behind a whitelisting SG).

- Use a discovery service for the prod resources. One of the simplest methods is just to setup a Route 53 Private Hosted Zone in the prod account, which takes about a minute. Create an alias entry like "db.prod.private" pointing to the RDS and use that in all configurations. Except for the Route 53 record, the actual address for your DB should not appear anywhere. Even if everything else goes sideways, you've assumed a prod context locally by mistake and you run some tool that is pointed to the prod config, the address doesn't resolve in a local context.

3 comments

unoti 3305 days ago

You made a lot of insightful point here, but I'd like to chime in on one important point:

> - Unless you have full time DBAs, do use a managed db like RDS, so you don't have to worry about whether you've setup the backups correctly.

The real way to not worry about whether you've set up backups correctly is to set up the backups, and actually try and document the recovery procedure. Over the last 30 years I've seen situations beyond counting of nasty surprises when people actually try to restore their backups during emergencies. Hopefully checking the "yes back this up" checkbox on RDS covers you, but actually following the recovery procedure and checking the results is the only way to not have some lingering worry.

In this particular example, there might be lingering surprises like part of the data might be in other databases, storage facilities like S3 that don't have backups in sync with the primary backup, or caches and queues that need to be reset as part of the recovery procedure.

link

carapace 3305 days ago

"Backups are a tax you pay for the luxury of restore" [1]

A lot of people pay the tax and never even try the lux.

[1] http://highscalability.com/blog/2014/2/3/how-google-backs-up...

link

everybodyknows 3305 days ago

Good blog post. This, I suggest, is its most essential point:

"Prove it. If you don’t try it it doesn’t work. Backups and restores are continually tested to verify they work"

link

dsr_ 3305 days ago

And put a firewall between your dev machines and your production database. All production database tasks need to be done by someone who has permission to cross in to the production side -- a dev machine shouldn't be allowed to talk to it.

link

StavrosK 3305 days ago

I would argue that no machine should be allowed to talk to each other unless their operation depends directly on each other. If I want to talk to the database, I have to either SSH to a worker machine and use the production codebase's shell, or directly to a DB machine and use a DB shell.

We've made things so reports and similar read-only queries can be done from properly firewalled/authenticated/sandboxed web interfaces, and write queries get done by migrations. It's very rarely that we'll need to write to the database directly and not via some sort of admin interface like Django's admin, which makes it very hard to do bulk deletions (it will very clearly warn you).

link

daxfohl 3305 days ago

Would you recommend all these steps even for a single-person freelance job? Or is it overkill?

link

_jal 3305 days ago

Depends. Do you make mistakes?

I absolutely do. "Wrong terminal", "Wrong database", etc. mistakes are very easy to make in certain contexts.

The trick is to find circuit-breakers that work for you. Some of the above is probably overkill for one-person shops. You want some sort of safeguard at the same points, but not necessarily the same type.

This doesn't really do it for me, but one person I know uses iTerm configured to change terminal colors depending on machine, EUID, etc. as a way of avoiding mistakes. That works for him. I do tend to place heavier-weight restrictions, because they usually overlap with security and I'm a bit paranoid by nature and prefer explicit rules for these things to looser setups. Also, I don't use RDS.

I'd recommend looking at what sort of mistakes you've made in the past and how to adjust your workflow to add circuit breakers where needed. Then, if you need to, supplement that.

Except for the advice about backups and PITR. Do that. Also, if you're not, use version control for non-DB assets and config!

link

sixothree 3305 days ago

For windows servers I use a different colored background for more important servers.

link

revmoo 3305 days ago

I do this with bash prompt colors on all our servers. Prod is always red.

link

kls 3305 days ago

I don't do production support on freelance development jobs. Even if I have to sub the hours to one of my associates, I always have a gatekeeper, that being said, when I design systems the only way to get to production is via automation, e.g something gets promoted to a prod branch in github, and production automation kicks off a backup and then applies said changes. The trick is to have a gatekeeper and never have open access to production. It's easy even as a one man shop. Git automation and CI are simple with tools like GoCD and other CI tooling and only take a day or two to set up, faster if you are familiar with them.

link

dennisgorelik 3305 days ago

It depends on how much is at stake. If product does not have users yet, then there is only small downside in accidentally killing database, so it probably make sense to loose some production database security access in order to increase speed of development. But if you already have a legacy system on your hands with many users/data - then it's time to sacrifice some convenience of immediate production database access for security.

link

wolco 3305 days ago

Depends on what you are hired for. If you are hired to create a web application and you spent time trying to create a stable environment with proper build processes it might be looked upon poorly. Everyone has different priorities and some have limited budgets.

link