Hacker News new | ask | show | jobs
by zerkten 838 days ago
Can you be specific about the cost of building these?

I've run into many situations where something was deemed costly, is found out later, and the team ultimately has implement it all while hoping no one groks that is was predicted. "Nobody ever gets credit for fixing problems that never happened" (https://news.ycombinator.com/item?id=39472693) is related.

4 comments

When I was in Search 15 or so years ago, there was actually a very direct cost: revenue.

The AdMixer was an "optional" response for the search page. If the ads didn't return before the search results did, the search would just not show ads, and Google wouldn't get any revenue for it. Showed the premium that Google of the day put on latency and user experience. I think we lost a few million per year to timeouts, but it was worth it for generating user loyalty, and it put a very big incentive on the ads team to keep the serving stack fast.

No idea if it's still architected like that, I kinda doubt it given recent search experiences, but I thought it was brilliant just for the sake of aligning incentives between different parts of the organization.

The developer, tester and devops time required to properly implement graceful degradation could easily accumulate to hundreds of hours.

Those hours are directly expensive when your developers cost hundreds of dollars a day; and have a material opportunity cost in that their commitment to one particular project delays the delivery of other features.

Moreover, any new features would have to be made compatible with the graceful degradation pattern, creating an ongoing cost.

When you hire an engineer to build a dam, you expect them to consider piping and subsurface flows such that the foundation isn't swept out in a decade. No matter of the engineer was already paid, retired, etc.

My point isn't that we all need to make dams that can hold up for a century. The point is that you hire an engineer because you want someone with the judgement and expertise to apply the correct amount of engineering to any given solution. Over-engineering is on the pathway to correct-sized engineering. It's the experience, discovery, and exploration required to arrive at choosing what things actually do not need to be done.

When your manager asks you, "do we really need to do that?" It's the expert that can explain why it really is necessary, and the professional who accepts "we're not going to do that" as an answer. And if they still feel it would be harmful not to do it, then that's where professional duty kicks in.

There's a lot of levels to the approach.

Just spending a few moments to consider whether queues should grow, block, or spill when adding them makes a big difference, along with choices in error handling. You can get a lot of things to gracefully degrade for free if that's a part of your decision-making process.

Could be as simple as just some feature flags with environment variables
I also found that when building a feature iteratively, with feature flags for rollout, a simple feature degradation path often appears natively.
For one, it potentially multiplies the testing and regression testing requirements to hit all those additional configurations.
Effectively every piece of software written for at most a few thousand people to use concurrently (i.e. 99.99% of software).

Consumer apps that scale to hundreds of thousands of users with five 9s+ uptime requirements are very rare.