Hacker News new | ask | show | jobs
by bfgpereira 2353 days ago
This is what I have learned in many years of work: people who know systems should be let to handle those systems.

This is what happens when a developer is left to do the work that a system administrator should be trained to do - not all are -.

For a developer, in most cases, "just works" is the end goal, when referring to systems. Not "how it works", and what are the implications of making it work like this.

This really makes me sad.

6 comments

I would disagree: The problem is developers (and users in general, but their lack of formal training is an excuse) being comfortable using interfaces and abstractions they don't fully understand.

Note that the result of this might sound like it makes the idea of a professional system administrator invalid but that's not true: I think the better SAs of the past had a thorough understanding of what their tools did and many of probably even modified them, this contrasts the current situation where people are poking things in PAS GUIs and accidentally running up huge bills.

I am agreeing with what you wrote, though. Or at least I am trying to.

I have seen numerous times the results of that, where for instance, a developer creating a tool decides that his interpretation of a bad requirement is satisfied in a poor way. Or a sysadmin deciding that a default configuration is good enough because he did `mv conf.example to conf`, and it works.

I guess what I am trying to say is that the learning curve given the complexity of software / systems now a days, and the lack of judgement and training by the users / developers / sysadmins of those systems results in decisions where risk is not taken in to account.

I miss the time where people genuinely knew what they were doing, and had a mindset that allowed them to avoid / prevent risk in the decisions they take during their daily tasks.

Was there ever a time like that? I’m only in my 20s, but I can’t recall a time when most computer systems weren’t terrible. (I love computers, but honestly, they suck ass.)
Unverifiable working theory (I didn't live through the 80s myself):

I think computer systems were always terrible (I think our brains were only ever able to properly/completely grok the old 8- and 16-bit microcomputers and early game consoles), but because early networks and computer systems were built almost entirely on a combination of naivete and lack of awareness on the part of the large corporations, the sysadmins of the day got free reign to do whatever they wanted however they wanted it, pretty much by default.

With no "here, please build this highly technical thing.... with management's help" boring into your back making you question your own every move, and a status quo (read: world market) that simply didn't understand what was possible and how quickly it could be accomplished, the technical scene of the 80s and 90s was largely owned by the sysadmins, free to run everything at whatever pace they liked. You can imagine that genuine motivation and interest in maintaining mastery of the craft flourished in such an environment. So of course people knew what they were doing.

Lines of respect were drawn. (And script kiddies learned where the "real" sysadmins were and stayed away - lest they be out-pwned.)

Today, everything runs far too quickly to get the remotest handle on which way is up, let alone a competent understanding of all the interactions between everything. At the same time, a lot of early default assumptions (eg, "UNIX is a universally good design, and C is the best universal programming language") are being thoroughly trounced because of scalability issues nobody would have dreamed of in 1994, and security issues that a bygone era was honestly too complacent to take seriously.

Incidentally, the vitriolic reaction to the introduction of systemd IMO makes for a good example of why conservatism is bad: when push came to shove, those same sysadmins that had free reign to go as fast - or as slow - as they wanted, had grown so complacent and non-proactive, they were unable to coherently a) band together and b) argue (showing the "workings-out", not just the solution) against a cause none believed in. Instead there was vitriol, death threats and a peanut gallery louder than a fireworks display. TL;DR, that era did not age well.

(Note that I do not give the example above as a point of "oooo, they're utterly incompetent and incapable of anything" or "80s = bad". Black-and-white-ness is not implied by there being only one datapoint.)

I especially agree with your last note. Again, I’m young, but it seems to me that the shortcomings of humans have a largely homogenous distribution through time and space.
It's wrong on nigh-on every point in its descriptions both of the 1980s and of today, however.
Mumble mumble The Great Worm mumble mumble.

In other words, I don't think that was a time that actually happened: we just remember the good parts and perhaps a few spectacularly bad parts, but never the mediocre hacks that always make up the remaining 90% of everything.

I submit that you are heavily romanticizing the past. (-:
> being comfortable using interfaces and abstractions they don't fully understand.

I don't think it was an interface or abstraction that got the user in trouble here, they were using a pair of systems in ways that were fine on their own, but combined led to an emergent vulnerability that they didn't even know to consider.

It may be sheer pedantry but I really do see this as a unique "systems" issue, and this type of 'emergent' property between separate self-contained programs is fully within that domain.

I think it is an abstraction: the one underlying both of those components that combined to create the vulnerability.

The abstractions provided by the OS compose in very surprising and hard to predict ways for humans. This is why newer systems don’t use the same abstractions (JavaScript and browser APIs), or else sandbox them much more thoroughly (iOS).

Those newer tools have newer problems, of course, but I think a lot of the churn and reinvention of tech that we complain about is really about trying to find abstractions that combine in more predictable and useful ways.

> The abstractions ... [are] very surprising

From this[1] insightful video essay by Kyle Kallgren:

>> Metaphor Shear -- That feeling all users experience when you realize the metaphor you are working in is bogus. When the computer fails you and you remember that there are a hundred translations between input and output. Codes and translations we don't have the time or patience to do ourselves. Intellectual labor that we've surrendered to a device.

>> The joke at the center of Douglas Adams Hitchhiker's Guide To The Galaxy is about metaphor shear. The answer to an important question lost on its long journey from input to output. A computer glitch so huge, so strange and so embarrassing that its programmers have to make a computer the size of a planet to file a bug report.

[1] https://www.youtube.com/watch?v=hr9_DcO6G3A

I’m not sure I agree with this, there’s a point where you start infantilizing users and make it difficult or impossible for them to do useful things with their computers. iOS is still like this (although they may be headed in the right direction) and I’m fairly certain the web absolutely is.

I will say apple absolutely got the built in ssh client in iOS13 right.

I'm glad I can do both, but being the pivot point is less than stellar sometimes and can stick you with odd tasks (or way more than you want).

For too many developers "i shouldn't be blocked" is the highest virtue and that's when things like this happen.

You describe precisely one of my points there. Its incredible how comfortable people are touching components that they don't fully understand. :D
I've encountered a system administrator who left the admin LDAP password (for the entire organization) in plaintext in a world accessible script. I'd tell you the name but I don't want to drag the institution through the mud unnecessarily.
Oh holy hell this is a common mistake. Especially if your developers have local admin rights. Don't expect /etc/skel (OS X) to be unreadable.
Honestly, I wish more developers were familiar with the Linux command line. It definitely has a learning curve, so it makes sense that people avoid it if it's not necessary to launch their app. That being said, there's a lot of value in knowing how to set up a server yourself, and secure it.
Isn't this exactly the reason people say PHP is awful, though? Hidden pitfalls if you just do what works and aren't security conscious, and too much flexibility instead of one clean standard. And at the end of the day, putting blame on rookie users who keep wandering onto the busy street and get run over by a bus.

At this point there is no reason sane network security shouldn't be baked into popular Linux OSes (except that the cloud is busy abstracting the problem away). Sure, real sysadmins can have the keys to the gun safe, but these are structural problems that could and should be mitigated through modernizing OS design.

Literally nothing you said makes sense.
Sys admins are dead. I have GKE now via a reliable Terraform module. None of my production instances can be logged onto.
But if you run your containers on GKE you are a sysadmin. The system might not be a traditional UNIXy one, but there are still elements you need to take care of that aren't just application development:

Do your containers have security vulnerabilities? How do you keep track of them?

How do they communicate? What's your blast radius if something is compromised?

Who has access to your container management plane?

Have you correctly sized your system?

Is it resilient to failures?

What about your data? Is it backed up? Have you tested the backups? Do you have volumes that can fill up? I/O limits?

All those are sysadmin things that will never go away.

I understand what you mean, but your GKE instances probably can be logged onto: `gcloud compute ssh --zone=<node-zone> <node-name>`

Both COS and Ubuntu nodes are integrated with IAM, so this transparently provisions a user on the node, and copies a temporary SSH key. If your GCP user has the appropriate IAM permissions, they will also be able to use `sudo` to execute commands as root.

Are you sure about that?
Only time will tell if I am in fact right. I'm counting on being more right than the guys who have dedicated staff who routinely shell into their servers.

I suppose we'll see if companies with sysadmins have more breaches than the guys who run their own ops using container orchestration etc. I think I'd go even odds $1k that over the next five years, most large scale data breaches will be at organizations where sys admins run the majority of ops.

There's a whole new category of errors you can make (making your bucket open, etc.) with cloud providers but the tooling has better defaults.

> Only time will tell if I am in fact right.

I guess you prove a point here: you trust that the deployment of your GKE module allows you to be safe, so your investment vs risk trade seems to satisfy you. But you, yourself, cannot even predict how insecure you are at the moment due to the complexity of the software solutions you are using.

> I suppose we'll see if companies with sysadmins have more breaches than the guys who run their own ops using container orchestration etc.

Oh, I do agree, at least with the current state of sysadmins out there.

But the problem is that you loose control of the integrity of your system once you reach a point where software complexity becomes your security entry point.

If you are sure the containers you will be using are secure, have sane defaults, are up to date, etc, then fine, good job! I just don't trust that most people will be able to reassure me that. And please keep in mind that in your case, the target is not your container platform, the target would be the containers in that platform, and the services they run.

No one can really predict their security accurately.

Say you maintain a bare-metal server in a data center that your company controls.

How much do you know about its physical security? The protocols for admitting new staff? Do you rely on your company’s physical security team and HR? Are any of those functions contracted externally, even partially?

How much do you know about the network security? Do you rely on a networking team? Is any of their work contracted externally? What about the link to the outside world?

How much do you know about the sourcing of the physical hardware?

How much of the source have you audited? How about firmware source?

GKE obviously introduces new factors and vectors, but it also simplifies many of these and adds elements of herd immunity. And it’s also the same as the rest: your system was built by many people, it will be used by many people, and it will be maintained by many people. You can spend all of your time verifying every link, or you can help them do something in the world.

How well do you vet your IT staff? I was once the recipient of an inside job: http://boston.conman.org/2004/09/19.1
You're absolutely right about not being able to predict the security. But, to be honest, I doubt people genuinely do that in a scientific manner.

And yes, I do expect the failure point to be my containers, not the container orch platform. The container orch platform allows me to use containers, though, and the fact that containers aren't append-only the way that long-running machines run by sysadmins are gives me a head start out of the gate.