Hacker News new | ask | show | jobs
A tale of latency and broken windows (yenkel.dev)
40 points by yenkel 2177 days ago
3 comments

A simple way to include performance in the development activities without ending ib premature optimization territory is to introduce timing budgets.

Decide for each operation how long it us allowed to take. E.g. pushing this button reacts in 100ms. Make your tests record these times. Optimize when it becomes clear you're not going to make it.

There was an intervieuw with the Intel engineer that got the linux laptops booting in 10s. They budgetted, x seconds for kernel, y seconds for X11, etc. .

This can work really well with the modern tools for traffic shaping and CPU-limiting (either in browser or the OS): set budgets for things like different network profiles and run your tests in each one so you don’t have to argue about whether “good enough” on someone’s tricked-out dev workstation will be that way for a wireless user on a 5 year old device while still recognizing that these won’t provide the same experience.
This sounds totally reasonable for a company like Auth0 that has a) a big engineering team and b) tons of people who are depending on their APIs not being slow.

Do you think it's easy enough to get this sort of machinery set up even for small startups?

If you're collecting stats anyway, it's not usually too hard to collect a latency metric and process it.

There's room to argue about methodology (ie client vs server latency, where and how timing probes happen), but just pick something and try to measure no more than 10 phases of the requests (timing probes have costs too), and see what happens. After you have a baseline, you can set targets and/or see how releases change your data.

Why not? It's more a question of attitude than technology.

Measure it any way you want. Take an old pc, drop some monitoring package like zabbix on it, and let it poll the APIs. Log entry and exit time. It wont be perfect, but even the lowest effort measurement is better than nothing at all.

It depends on how fancy you want to get. It could be as simple as a unit test that runs some scenarios a dozen times under a timer and fails if 90 percentile time is greater than the target time.
How do you measure this? Run the item multiple times in different envs and take an average?
While you can make this as complex as you want, even something as statistical abhorrent as simply measuring 1 run on a build server gives you a lot of bang for your buck.

The main idea is to have some way to keep the pulse and notice the worst regressions.

I usually like to set percentile thresholds: bad performance is much more likely to cause user frustration so I’d set something like p50/90/97th values for your supported scenarios (e.g. on a website, you might say “old phone on 3G”, “laptop on WiFi/cable”, and “desktop on fiber”) and treat blowing the top end budget as the most critical: fix in next release if you can’t meet your 97th percentile targets but otherwise every release should incrementally improve.

For network latency, you can set profiles in things like headless Chrome or use the Linux / macOS traffic shaping globally.

Percentile thresholds IMO is the way to go. You'll find things that don't cause problems at P50 to be problematic at higher percentiles, and the higher percentiles are usually where the user pain lies.
This reminds me of a system I worked on, the devs all developed locally with all the services but in prod the services were on different hosts and the performance was terrible.

This reminds me of the system I worked on, the devs had promised a 3-tier system but delivered a 2-tier system so management had them add a middle layer and the performance was terrible.

This reminds me of the system I worked on, the devs decided a lockless architecture was the way for better performance but with the limited threads we had at the time, apart from the few people who kept hold of a thread until they were done, the performance was terrible.

This reminds me of the system I worked on, the database design was a serious bottleneck but no one wanted to fix that so the app tier tried to change their queries, the performance was terrible.

I don't know why people spend so much time optimizing the wrong thing - it happens over and over again!

Management not wanting to admit a large architectural problem, slapping a small bugfix/task on the board in the hope it will take less time before to (appear) fix before performance review.
I'm lost on the broken windows analogy. Projects like software are constructed from these discrete elements; criminal activity is not. Bank robbery doesn't have a dependency on littering and jaywalking.

It makes more sense from the perspective of the real life outcomes of the law enforcement policy. By making an outsized response to trivial infractions a goal divorced from measurement against the desired result, finding reasons to harrass citizens became a performance target, and the policy is widely considered to be a failure in its intended outcomes and unintended consequences. Similarly, "Always Eschew Latency" as a task level target independent of desired project outcomes is likely to result in a product that performs ahead of schedule in every way except delivery date.

AFAIK the theory says littering does have an effect on bank robbery: If people see litter, you instill a mentality where caring for the neighbourhood is worthless. From that 'each for his' , 'dog eat dog' world view, the step to big crimes is smaller.

Now I have no idea if the theory is correct.

> AFAIK the theory says littering does have an effect on bank robbery: If people see litter, you instill a mentality where caring for the neighbourhood is worthless. From that 'each for his' , 'dog eat dog' world view, the step to big crimes is smaller.

> Now I have no idea if the theory is correct.

The theory is correct, but the effect is smaller than proponents supposed (and seems to affect minor property crime more than violence). The poster child for the theory is the reduction in crime in NYC, but more recent analysis has shown that a larger driver was the elimination of lead from gasoline and paint.

I remember a closed source project I worked one once where it was like that. There was a common core that people kind of just threw stuff over the fence into. Horrible, inconsistent style. Four different ways to do the same thing, all different, with different bugs, etc.