Hacker News new | ask | show | jobs
by anw 1829 days ago
It's cathartic to read other people who have to go through this.

I'm fighting red tape for my team as we build out a dashboard.

Outlook is packed with 1–2 hour meetings for the next 3 months where so far I'm:

* being asked to load test our system to make sure it can handle the load (of 3 people?)

* being asked to integrate with various analytics platforms so we can alert some poor schmuck at 3 AM in case the API goes down then (it's not a vital part of any platform)

* told to have this run in k8s since everything runs in k8s

* other pedantic tasks by Sys Ops who think everything is a nail and love to argue their points ad exhaustium (or worse, argue why their fav stack is the golden child)

I understand the need for standards and making sure they're followed, but there really needs to be a human element of, "is this truly needed for what I'm trying to do?". So many engineering departments are all about automation, but don't truly think through how much automation is needed, rather than a 1 size fits all approach.

I appreciate that this article comes to the conclusion that the more correct an answer will be, the more complicated it tends to be. I wish more people in decision making positions would understand this.

8 comments

The minor conclusion of this article was the more interesting (and perhaps more practical) of the two:

Hide concessions to various leaders in the project roadmap.

This isn’t just a “bureaucratic trick” as the OP suggested, it’s actually a way to convert unconditional advice into contingent advice, by encoding a priority.

> to convert unconditional advice into contingent advice, by encoding a priority

This is one of the most important things I've learned as a developer, and one that I thought I invented myself, before I knew about agile, by keeping a whiteboard near my desk with yellow sticky notes ordered by property:

"Yes, I get that it's a must-have feature, but where do you place it in relation to these other features?"

The concept of prioritization of features, and of saying "if I stopped dead at some arbitrary point in this list, would you have been happy with your order?" seemed so eye-opening to people at the time.

Sometimes, the features are really must-haves though. Let’s say it’s march 2020 and your boss wants you to design a mass-market covid vaccine. You have three requirements: it needs to be safe for human use, it needs to be effective at preventing covid, and it needs to be possible to manufacture. If any one of these is missing, your design is useless. I think a similar dynamic is visible in many software projects.
That's totally fine - but people need to also be aware that if something is really a must then they have to be willing to spend adequate time and resources on getting it done, instead of assuming whatever resources they have on hand will be sufficient.

It's amazing the things that stop being a "must have" as soon as they have to spend more money.

It's a never ending struggle to get people to create this ordered priority. I always tell my developers to say

"if you do not give this an ordered priority, I will resolve items as I see fit. Should we need to stop for one reason or another, there is no guarantee of which have been resolved".

Often times that is okay. I also tell them to always take the ones they're most uncertain about first. Better to front load hard problems and uncertainties.

A manager once brought up "there are three levers - scope, time (deadline), money (people)", and while it's probably not revolutionary, it did stick with me.

Add more "must have" scope, and something else has to give.

I would argue that the scope lever should be set to 60%, time 60%, and money 35% for software projects.

Software projects are kind of like ovens-- if something cooks perfectly at 300 (temperature units), using 25 minutes and using 5 (money units), that does not mean it will cook perfectly at 600 temperature units using 12.5 minutes and 10 money units. Most likely it will burn.

Even there, drug companies often go through the features in a particular order. You start with a range of formulations which you suspect will be safe for human use, you test them to see which are effective, and then you hand them off to a different set of chemists and chemical engineers whose job it is to figure out how to manufacture the doses at scale.

Every part is necessary, but that doesn't mean that there isn't an ordering. Finding something that's easy to manufacture is pretty useless if it turns out later that it kills the patient. On the other hand, a drug that's safe and effective, but is difficult to manufacture is still a viable drug; worst case, you do what drug companies do all the time and charge obscene prices per dose until you figure out how to scale the process.

Then you do what the drug companies did: you hire consultants to do it for you. I did architecture work for one of the major covid vaccines for most of 2020 and that’s exactly what they did.

The overall tone of the program was “we basically have infinite money, just get it done and the government will pay us back”. So they had a fucking army of consultants to accelerate a process that normally takes 5+ years down to 6 months and they were building down multiple roadmaps just in case they hit a block on one of them.

Yeah, that's not so much a "nifty bureaucracy hack" as a core skill to completing any project. It doesn't even have to be 20 unrelated people's feedback... it's my own priorities quite often that I mercilessly stuff on the backlog. YAGNI isn't just at the micro code level, it's a core project design skill. In fact I probably YAGNI my roadmap much harder than my code since I often have a good idea that I will in fact Need It at the microlevel after decades of experience and can save some time at that level, but at the project roadmap level anything you can trim is getting the product out generating value sooner.

(Obviously one can go too far, blah blah blah. But just as with code, we have a much larger problem in practice grabbing too much from the project feature buffet than too little.)

And then some to level priorities shift and you look at the backlog thinking "if only we have done x before that".

Usually when your unfinished prototype ends up in production. That's the danger of reporting progress to people who think you can go to space on a paper glider.

Probably half of more of start-ups end up failing like this as their quickly delivered prototype fails to capture the market due to not being actually better enough, or crumble under the initial success.

Good for investors and managers who bail out early enough, very bad for users.

> Yagni originally is an acronym that stands for "You Aren't Gonna Need It"
This is it. I do a lot of consulting work around this problem, and the roadmap is where the business and technology meet. It’s where you convert sprints into calendar boxes. It’s also the part most companies do poorly because nobody likes to spend money on good project/program managers (hint: hire product managers instead even though they’re ~25% more expensive because everything in 2021 is a product in some way).

When you do it this way, you can decide well ahead of time if you need to bring in a contractor to build a must-have feature your team won’t have bandwidth for. It flips the narrative and puts the responsibility on the business side (which usually controls the budget anyway).

This works especially well when you set and own those priorities, or if your management supports those priorities. Everyone who wants their feature will need to justify to you that their feature deserves a better placement on your roadmap.

It does not work if you can not defend your priorities.

> Create an extended product roadmap and put those items at least a year off into the future “and as long as they don’t seem relevant, you can just keep pushing them into the future.”

That actually seems to me like the root cause of all the calamity in the article, a culture of lying.

I don’t see it as lying in any meaningful way. Specifically in the article the problem was that there was technical feedback from many parties that have very little, if any stake in the matter. I’d be willing to bet that none of them even bothered to look at the product roadmap to check on the progress or status of their suggestions.

Rather, the “cause of all the calamity” in the article seems to be the fact that the business has a culture of requiring feedback from random individuals who have very little stake in the project or product delivery.

Upon further reflection, the root cause is poor communication, and the 'bureaucratic judo trick' is just a continuance, or perhaps even an escalation, of an organization's poor culture.
This is absolutely not lying and I'm disappointed that anyone thinks it is. This isn't "not doing the thing and saying you did", it's just setting its delivery date into the future, an entirely routine operation for every software project that actually ships.
From my perspective, the tactic misleads the stake holders about the real priorities. It's a deception and corrodes trust in the organization. The article even describes it as a 'bureaucratic judo trick.' It really seems to me as analogous to the micro-services guy or the architect guy insisting their way prevails.
I think we’re talking past each other. In my reading of the original article, the author needed to get consensus from various individuals in the organization who were by definition not stakeholders. They had little to no stake in the project or its goals and could therefore block the project with no personal risk.

I could be misreading the article, though.

I agree that it is detrimental to trust to lie about the project roadmap to stakeholders.

I’ll add to the list:

- Ceremonial unit tests for every little thing. The whole system is buggy as hell and we don’t have any confidence that the unit tests are truly covering critical parts of the app. But alas, test coverage, the god damn Pope that can never be bemoaned.

- I’m not making this one up: A/B testing for an internal enterprise app.

Test coverage is almost the perfect illustration of Goodhart’s law. Good programming practices do result in high test coverage, coverage is very easy to measure, but very easy to fake with useless “tests”. So, when the coverage is measured, the coverage goes up, but stops being meaningful.
> but very easy to fake with useless “tests”

While it doesn't alleviate the problems entirely, you can also run things like mutation tests that check that your unit tests actually test conditions, rather than just execute all the code.

High coverage isn't enough but, in my experience, it's a great place to start.

I've written an depressingly high quantity of code in my career that blows up literally the first time it runs. I'd much rather that happen in a unit test than in production.

Any test that exercises a given branch is better than nothing.

Coverage can tell you what you didn't test, but it can't tell you what you did test.

> Any test that exercises a given branch is better than nothing.

I disagree with this. If you have a test that doesn't actually test anything, you can't tell that you're not really testing that branch. No test is better than a bad test because it's easier to fix.

I have seen bad unit test being introduced when engineering management starts enforcing a threshold (80% coverage). Often developers will scramble to test trivial methods, such getter and setters, but will not write any suitable tests that actually cover the business logic. It is even worse when management only enforce a 80% coverage for new changes. In those scenarios developers go out of their way to encapsulate changes in a separate class to avoid having to test the original codebase in a meaningful way.
I think most people, including the managers are aware of the problem you've highlighted. What's the solution?
The solution is don't measure test coverage. Measure something you actually care about, like minutes of downtime.
Every time a bug is found I ask my team to write a unit test for it to prevent regression for that bug.

During peer review I encourage Engineers to verify that the actual business logic has been tested, for example calculations.

If done correctly, a low unit test coverage can actually be of more quality than enforcing an 80% threshold

Back when I was struggling to develop features in overengineered hell, I commented to my friends what a breath of fresh air updating a personal site with scp was.

They all gave sighs and shudders of disgust, but then again, they had normal programming jobs, so I suppose it seemed quite backwards to them.

Oh, but scp won’t update it atomically, so you should switch to a scheme that will. Then all you need to do is set cache policies correctly, coordinate with your CDN, and maybe do a staged rollout, just in case.

/s

Seriously though, rsync is your friend. :-)

> Seriously though, rsync is your friend. :-)

Even rsync might not be atomic enough for some situations[1] since it'll update files as it goes rather than in one huge transaction at the end.

[1] I worked on the World Cup 2006 site for Yahoo! and we had this issue - solved with 'rsync --link-dest' and swapping symlinks.

Write a script.

1. stop service 2. copy files 3. start service

Now you have two problems, because for high availability you need a failover or better yet, shadow secondary service.

Hot patching wins, but needs good design to work in the first place.

You just need to decide how appropriate that is for your situation.

As an industry I suspect we tend to over-engineer rather than under. There is a huge spectrum between my single person business with a brochure site and what Google or Apple needs. I'm willing to bet most programmers are working closer to the first than the second.

> 1. stop service 2. copy files 3. start service

That assumes you can stop the service which, for many things (like the World Cup website), isn't really possible.

You know, I think I might have switched to rsync at one point- I haven't had the site in a few years now, so my memory is a bit hazy.

It was sufficiently small enough (no heavy media files) that I didn't mind if I left some unused files up there. Pretty much the only thing that I had to do was make a copy of the sqlite database each time just in case.

--delete-after will delete files on the destination after everything else has synced so you can be sure you aren't linking to a non-existent asset.
The other side of the coin you are not telling is: let's ship this small project to production without all those useless bells and whistles, and then fast-forward 12 months, suddenly everybody is using it and it starts failing spectacularly, and now all those teams that complained in the beginning have a fire to extinguish. I've been too many times on this other side of the coin.
being asked to load test our system to make sure it can handle the load (of 3 people?)

The problematic load in a dashboard isn't users; it's querying the data sources to get up to date information. For example, if you're running a query to aggregate a bunch of things with lots of joins and that query takes 1.5s to run but your dashboard tries to run it every second so it can be 'real time' then you're in for a bad time even with just 1 user. You absolutely need to load test a dashboard application that's running against production data.

being asked to integrate with various analytics platforms so we can alert some poor schmuck at 3 AM in case the API goes down then (it's not a vital part of any platform)

It might not be vital right now, but if you make a dashboard for it then it'll quickly become vital. Putting metrics in front of people focuses them on those metrics...

I've downvoted your answer here as I think it is a ungenerous interpretation of the post you replied to.

It's just as likely that OP did already know that what you are insisting on is not relevant to their use case. That might be why they stated it.

Cathartic is certainly the word. The title in particular really hits the mark for me.

There are a lot of people talking about computer programs, and telling us we should do things this way or that way. Even telling us that their way is certainly the best or only correct way.

A great many of these people - perhaps the majority majority - are plain wrong. Some of them talk such nonsense that I suspect they don't have any actual ability to program at all!

How can they be so sure of themselves?

> * told to have this run in k8s since everything runs in k8s

I've seen a production system handling one request (which takes a handful of ms) every 2 seconds (work hours only, mind) in k8s running 8 pods. It is quite breathtaking.

How do they handle the load balancing with that much traffic?
I feel your pain. Been there, done that, probably still have the t-shirt.