Hacker News new | ask | show | jobs
by rantanplan 3429 days ago
Debugging directly on the production system?

Yeah, nobody does that. Well, nobody who has any kind of serious operation going on.

If you can't replicate a bug outside a production environment you have a serious deficit in the devops department.

You should be able to stage a machine with a copy of the production DB/data and the same reproducible package(code) that currently runs on your production systems.

Everything else is just an accident waiting to happen.

3 comments

> Debugging directly on the production system? Yeah, nobody does that.

Uh, what bubble are you living in? Almost everyone does that to some extent. We can all pretend we're code ninjas with super clean deployment pipelines and strict policies, but all of us get lazy occasionally.

Not to mention that, in any successful project, there are going to be systems that need to be looked at by someone else. It's great for the original developer(s) that they can use their IDE to access files directly in containers, but if I'm called to investigate a running system then I'm going to end up in a shell on those hosts.

The only place I can see this never happening is in large enterprises that have huge departments with slow enough turnover that "the accepted workflow" is easily passed on to new hires. Referring to everyone else as "mom & pop shops" makes it sound like the speaker is living in a Silicon Valley enterprise bubble.

I find it amusing to see that term used as an insult on a site largely populated by startups.

Sounds pretty self-serving to say that I "referred to everyone else", just because you want to justify sloppy processes. I only referred to those who do that, and I'm sorry if I'm bursting anyone's bubble here, but nope; not everyone does that!

Also a startup doesn't necessarily have requirements for serious ops, but it may, some day. Well if they want to be an enterprise at least. You do know what the goal of a startup is, don't you?

And btw, no I don't live or work for company in SV. Europe mostly, with a dash of Dallas, Texas in the mix.

In any case, I was hoping for engineering arguments, not personal attacks. :(

dont no
"Almost everyone does that".

No they don't. I am not living in a bubble. You're describing a mom and pop shop.

And by the way you don't have to be a code ninja or have a super devops team for that.

You can have your code locally, point your DB config to a staging server with a copy(you do backups daily, don't you?) of production and boom!

Seriously now... elementary stuff. If you're still changing code directly on live systems you have a long way to go.

And you're describing a pipe dream that doesn't happen in real engineering orgs. I think I can speak for most people here when I say that the best of DevOps intentions only last until your first fire. You can design for best practice, but odds are something will happen that requires you to jump onto a moving train and figure out what the hell is going on. That doesn't mean that DevOps is pointless, it just means that a dose of realism is required to understand how to do real engineering. Sometimes you just won't be able to do things the "right" way because you only have time to do them the "wrong" but still working way before your customer gets pissed. The real world has very different requirements than a management textbook.
Well I don't know what your friends, social or professional circle is, but it does happen.

What I describe is not out of a management textbook. This is really really elementary stuff if you want to sustain any serious operation.

Let me explain: so you're saying a customer is on fire. Let's say that a query generated by that customer for some reason it creates a spike in your DB. Let's say that it almost DOSes your system. You know it doesn't happen locally, with your dummy/fixture data and it doesn't happen with other users/customers. So some particular configuration of state(your DB) is creating this problem. What are you gonna do my friend? Retry the query on production while you have a debugger on the webapp process? Please... be serious.

Remember, it's only once you have to setup this kind of infrastructure(and you can scale it up or down depending on your resources). And then you can do business like a normal person. What you describe is like throwing my sanity out of the window every time there's a severe(or not) support issue ticket. It doesn't scale!

Our host, Paul Graham, talks about his first successful startup sometimes. It's ViaWeb, now Yahoo Stores, and it made him the rich investor he is today.

He edited the code live on the production server sometimes. And he did it in the repl, not by editing source files, so he didn't even have a record of what he changed that could go into version control.

When you're nimble and you need every advantage to compete, sometimes you have to be able to do things the hard way.

Not that I would recommend it; every ethical programmer in the galaxy has denounced the practice. And they have good reasons.

>> in those rare cases when you can't replicate bug locally

> directly on the production system?

Why does 'non-local' have to equate to production? Are you able to use an IDE in your container/VM/cloud server?

Yes, PyCharm supports Vagrant, Docker etc.

My host system is Fedora but since our production code is deployed on Ubuntu, I have a Vagrant Ubuntu image. PyCharm reads the Vagrantfile and it has access to the virtualenv(python in our case) of the Vagrant's image.

UPDATE: I am not sure if that's what you were asking. Maybe this response(https://news.ycombinator.com/item?id=13543158) I gave is more relevant ?

That VM/container is still essentially a remote system. Do you never ever log in to it? What types of instrumentation are you using to characterize the performance and operation of your app in the dev instance?
I can't characterize the performance of my app on a dev instance. I don't even think it's possible. You could characterize in the sense you do it for algorithms(with regards to time complexity).

But in general this is my(our) process.

- First I write the code.

- Then I test the code locally.

- Then I test the code on an instance with a copy of prod data. If I want to debug at this point, I run the web process locally while my DB configuration points at that specific server with the prod data.

- If it's code that spans across more than one subsystem(e.g. it also involves async celery task/queue servers) I test on a staging environment, that duplicates(as much as possible) the prod infrastructure.

- Then I deploy.

Obviously there are other various details, plus rollback plans in case something still goes wrong, but that's the gist of it.

> I test on a staging environment, that duplicates(as much as possible) the prod infrastructure.

I believe it's possible that nobody ever logs into any instances in the staging environment, but if that's the case then your organization is an exception to the rule. Plus you mention it's "as close as possible" to the prod infrastructure rather than identical.

The point is: don't act superior by feigning ignorance that most shops still have people who occasionally log into Linux hosts.

Log into the machines to your heart's desire. I never said we never log in.

I said we never directly change code on production because it's bat-shit insane. That's why this thread was started. You can trace it back.

I never acted superior, but I really don't get the "me and some people I know do this so everyone must be doing it". No they're not.

And what I described in all my responses (not just to you) can be scaled down to this:

1) Do you know how to install a Linux machine?

2) Do you know how to install a DB?

3) Do you know how to take backups of your DB?

4) Do you know how to change your DB's config so that it points to other machines?

5) Can you write a shell script no more than couple lines?

You answered yes to all of the above? Then congrats you can already do what I'm describing. Yes, you can go crazy later with configuration software(ansible/puppet/etc) and better infrastructure, but the above will suffice as a first step.

So all this "haha, this guy is in his own SV bubble, haha"... yeah really not getting it, sorry.