Hacker News new | ask | show | jobs
by jreese 1690 days ago
As a member of a core infra/"foundation" team, the biggest drain on my soul is the number of other engineers that are helpless, or never learned how to find solutions on their own. They never search wikis, look for similar posts on internal groups, or even read the error message from the tool that tells them exactly how to fix the problem they're asking about. When the culture has become "google everything", but you can't google for internal tools/tech problems, suddenly folks have no idea how to read error messages, debug a stack trace, or solve any problem without hand-holding from a senior engineer. I've been at BigCo for nine years now, and it has only gotten worse as the size of the company has grown exponentially.
6 comments

Think about it from the other side:

Dealing with the infrastructure is your job. You know the ins and outs. It doesn't seem complicated to you.

For someone who is dealing with the application logic, having to context switch to infrastructure is painful. I have no idea how you have set things up. When I read your documentation I'm just baffled at the sheer amount of complexity. I don't want to deal with it. It's not my full time job. I have other things to worry about. I have no mental space to deal with this headache. I will just ask you so I can move on to do my actual job - which is already taking up over 90% of my mental capacity.

When I setup my own infrastructure, I make it as simple and stupid as possible, precisely because I have no mental slack to deal with all the complexity that I see devops people deal with on a daily basis.

At most jobs I've been to, the complexity I've seen in infrastructure is mind boggling. My intuition is that this complexity is not actually needed if you remove all the cruft and think from first principles about what the actual problem is you are trying to solve.

However, people in devops seem to enjoy building and maintaining this kind of complexity, and even take pride in how everything is orchestrated.

To me it's just a headache.

The drain I am talking about is a complete lack of due diligence on behalf of those asking for my help (and my limited time/attention).

When the error message contains a clear message and a URL with step-by-step instructions, and the person in question just pastes the full error message including the URL, and asks "what do I do?"

When a problem can be solved by responding with "please follow the instructions in the error message you sent me".

When someone makes an internal group post asking "how do I do X?" and the pinned post that was right below the submit button contains a summary and link to detailed step-by-step instructions titled "how to X".

I've been in your shoes and I've accepted this fate.

I am the expert. I know things. I choose to change my viewpoint to taking pride in making the best possible explanation that serves the users' need. When they ask me "what do I do?" I no longer snicker or sigh that they don't know or that they are too lazy to find out. Instead I copy-paste the answer they need. It takes me seconds, perhaps it saves them thinking. Thinking seems hard to some people. I choose to believe I save the company some time when I think for them.

Yes, they do not learn when they are spoon-fed the answers. Yes, they could have figured it out. So what if they didn't? They are producing whatever they are supposed to produce.

When I get too tired of the same questions, I change jobs. I am in a C-level position now, and somehow the questions from other C-level managers are similar. They have no idea how they should deal with things from my business area, and that's okey. I'm here to help. I help. When I get too tired of the same questions, I will change jobs.

The way I've seen things work is like this:

devops sets up something convoluted that they for some reason like.

Developers are confused by the thing. It causes them endless problems.

Documentation is sparse, or out of date, or just plain wrong. The instructions to fix problems omit key requirements because the person who wrote them just assumed the reader knows about X and Y when the reader often has no clue. You follow the instructions and the problem is not solved, and maybe to make matters worse, you run into new problems.

The way I see it is: if developers enjoyed dealing with infrastructure, devops jobs would not exist, it would just be something that developers do as part of their job.

As a devops engineer, you should think of your fellow software engineers as your users, not as your "project members".

Your job is to make it as painless and problem-free for them as possible. If problems arise constantly, that's not the user's fault; that's your fault.

> My intuition is that this complexity is not actually needed if you remove all the cruft and think from first principles about what the actual problem is you are trying to solve.

Have you ever worked on infrastructure? This comment is wrong on soo many levels.

> Have you ever worked on infrastructure?

Not really. From an application perspective you can generally get away with two really big servers (one of which is the backup), and a load balancer.

There may be more involved in the surrounding infra, but that’s stuff the application developer will never have to touch.

This comment is not wrong on any level.

I have worked in many aspects of web development. I know for sure that everything in the process is way too complicated for what it does. Unnecessarily so.

There's no reason to think that infrastructure is the one place where all the complexity is there for good reason.

Have I worked on infrastructure? Yes and no. Depends on what you mean.

I have worked on infrastructure in the sense that I have setup the infrastructure for websites that can handle thousands of concurrent requests, and the infrasturcure I setup is stupidly simple, everyone who sees thinks there must be something missing. But there isn't anything missing. It handles the job much better than the complicated mess that I see in all other companies.

Now, the thing I haven't done is actually setup the infrastructure for a truely distributed system that is horizontally scaled. I do all my scaling vertically because computers are so fast and storage capacity is so large that one server can handle thousands of concurrent users without nearly sweating.

Part of the reason everyone else has a super complicated infrastructure is because they chose bad software technologies to develop their applications, such as, python.

Yea, if your application backend is in python or ruby, you have no choice but to create a complicated infrastructure. Because those languages are so slow, you just cannot scale vertically. That's not an option. You get an explosion of complexity because you also need additional servers to handle stuff that would normally just be part of the application code.

For example, when I program in Go, and I need to cache some data, I just create a map, or something, to cache data in memory. But if you have Ruby, you can't just do that! Since you have 200 application instances running on aws, you want to use something like redis to act like a shared distributed cache. So now your infrastructure has to include information about how the application servers need to communicate with the caching servers.

All of this is unnecessary complication that can go away if you stop what you are doing and just analyze the problem from first principles.

Your massive convoluted infrastrucutre is only there to compensate for the bad decision you made early in the process, which was to use a slow interpreted language to write the backend of your system.

Most of the time the real solution to scaling problem is just writing better (and simpler) code.

I've seen at more than one company, how this complciated infrastructure does nothing to make the application robust because it grinds to a halt when there are about 1000 concurrent users.

I guess there is a disconnect between what you mean by the term infrastructure and what I mean.

Let's take an example, say Reddit. You want to support iOS, android and web. All platforms have the same core data but maybe have different APIs for optimizing queries, e.g. you may want to use server side rendering for web.

So here's some of the microservices that you will use server side. I will label which I consider to be infra:

LoginService (infra)

Database server or library (to create consistent views/transactions on top of your raw database, infra)

Core business logic server

Web frontend server

Android/iOS frontend servers (probably not needed)

Image/audio/video processing (infra)

ML server (infra)

I don't see how any of the infra services here are trivial.

If by infrastructure you mean microservices, then it's even worse.

I wholly reject "microservices", the philosophy and ideas behind it, the tooling and the culture surrounding it.

It's not only too complicated. It's worse than useless. It's extra work that produces negative value.

there is a saying, you may think you are intelligent but you are only as intelligent as what your student is able to learn from you.

it's basically "beautiful mind" syndrome, and it's steeped deeply in ego.

now if on the other hand that teacher is coming from a place where it was hard for him, he probably has a sense of empathy and is willing to teach.

Totally agree.. As a rails dev I'm asked to dig into ops.. Not once is the JS team asked to build out the rails code, or fix things in it when it glitches. The api business logic falls down, and thats on me.

But when ops is glitchy, I have to blow half my day faking interest in docker? I mean best case, I learn how ops works, and then what. I become the one they tap when ops is swamped, then before I know it, I'm doing ops.

And I quit, because as much as I love the ops team, I hate the gig. I'd hate to be a designer, or a BA. It's not where I find joy.

So when I hear people say.. Just follow these steps to trouble shoot, I wonder if it would be cool to tell that to our users..

Think about it from the other side:

Infrastructure is like application dev in that it’s a bundle of histories of feature work over time and with many contributors, and many customers with competing wants and not enough time to satisfy everyone.

Granted a primary goal should be to have simple APIs/workflows for app devs to use, but complicated insides is as natural as any advanced system.

I am working at company that just won a contract with a BigCo and it has to be one of the most frustrating experiences I have ever had. I was asked to test a configuration change and I was able to demonstrate that it worked locally in 15 minutes. Unfortunately, testing locally isn't enough so now I need to test it in the lab, but BigCo is so compartmentalised that it can weeks even to organise a meeting where all the stakeholder can get together and talk about what needed to be done to get the configuration changed in the lab.

We've had working sessions where the moment there was an issue, the developers at BigCo said that they need to talk to someone on some team and they wrap up the meeting for the day -- even if we had just got started. It's gotten to the point where we will have project managers in the working session -- essentially babysitting -- to stop the devs from just giving up.

The amount of stonewalling I've seen is insane. We were trying to address an issue with our integration and I was told that a particular configuration on the client side was impossible, or maybe that we needed a new license, or perhaps we needed to contract to another vendor and it's going to cost half a million dollars, or whatever other excuses they could come up with.

So I asked them what product BigCo used, I looked up the documentation, I sent an email with 12 steps on how to make the configuration change, I got on a video call with my contact at BigCo (and his boss to make sure he showed up) and I literally walked him through the setting up the configuration.

I just don't understand how you get to the point where that's business as usual.

> I just don't understand how you get to the point where that's business as usual.

Simple, just get 100 junior developers working on the same product, and make the Project Managers’s manage them.

I know your frustration and pain, but as someone frequently on the other side of the equation trying to seek out answers, it is often difficult-to-impossible.

Just this past week I was working through some infra setup challenges, I had to piece together half a dozen different documents, several README files, and a few archived slack threads. These sources often are outdated or even contradicted each other. Even after all that and some earnest trial and error I still had to ask for help.

It's fine to want to point to the docs, but when you do that you better have good docs you're pointing to. As an infra team, often your "API" is your docs and templates. Searching slack threads and piecing together bad docs isn't a scalable solution for every dev to repeat individually. But neither is pinging an individual with every question. Teams need to invest in good documentation.

No, no. This sounds completely different. You searched out all the information you could, and still had no idea. That’s great (I mean, not great, ideally the necessary information would have been available, but that’s on the people providing the information).

The problem is people that have not searched or tried at all, and then ask their senior to please do their job for them.

There is a classic article about this issue which is from Harvard Business Review in 1974 and is about monkeys. a Must Read.

https://www.academia.edu/36290372/Monkey https://docs.google.com/viewer?a=v&pid=sites&srcid=ZGVmYXVsd...

Hah, this is great. I think I’ll start telling people to keep their monkeys :)
I agree with this and would add Slack to the equation. It's just faster to ping a senior engineer on Slack than to spend 5 minutes looking into an issue.

I've also seen new hires burn an entire week because they were afraid to ask a question.

But I think the balance has definitely shifted towards asking too much help...

When I started at BigCo, the recommended policy was to spend two hours trying to solve it yourself, including searching the wiki, debugging code, etc. If you hadn't made any progress in that time period, then ask someone from your own team or make a post in a related internal support group. Only after exhausting that route, and getting no help, should you escalate to the team/oncall that owns the tool/service you are having problems with. It seems that culture has been lost.
I still do this and I don't think this is entirely about culture. Some people do the legwork and some take the laziest route. Sure if the technical tools at hand make it that much hard to do the due diligence, maybe promptly reaching out for help can be (somewhat) justified, but I don't think that is usually the case.
Here is my balance:

Study a bit, WRITE my discoveries down

Craft a message with issues and background clear to readers (that includes myself when I forgot about this)

Send message to selected recipients, continue studying depends on free time available and issue importance

> engineers that are helpless, or never ... or even read the error message from the tool

If they don't read the error messages from their tools, they are not engineers: they are idiots.

> I've been at BigCo for nine years now, and it has only gotten worse as the size of the company has grown exponentially.

You should try to change team or company then. Don't believe all SWEs suck. There are pretty good coders out there :)