Hacker News new | ask | show | jobs
by hyperman1 2177 days ago
A simple way to include performance in the development activities without ending ib premature optimization territory is to introduce timing budgets.

Decide for each operation how long it us allowed to take. E.g. pushing this button reacts in 100ms. Make your tests record these times. Optimize when it becomes clear you're not going to make it.

There was an intervieuw with the Intel engineer that got the linux laptops booting in 10s. They budgetted, x seconds for kernel, y seconds for X11, etc. .

3 comments

This can work really well with the modern tools for traffic shaping and CPU-limiting (either in browser or the OS): set budgets for things like different network profiles and run your tests in each one so you don’t have to argue about whether “good enough” on someone’s tricked-out dev workstation will be that way for a wireless user on a 5 year old device while still recognizing that these won’t provide the same experience.
This sounds totally reasonable for a company like Auth0 that has a) a big engineering team and b) tons of people who are depending on their APIs not being slow.

Do you think it's easy enough to get this sort of machinery set up even for small startups?

If you're collecting stats anyway, it's not usually too hard to collect a latency metric and process it.

There's room to argue about methodology (ie client vs server latency, where and how timing probes happen), but just pick something and try to measure no more than 10 phases of the requests (timing probes have costs too), and see what happens. After you have a baseline, you can set targets and/or see how releases change your data.

Why not? It's more a question of attitude than technology.

Measure it any way you want. Take an old pc, drop some monitoring package like zabbix on it, and let it poll the APIs. Log entry and exit time. It wont be perfect, but even the lowest effort measurement is better than nothing at all.

It depends on how fancy you want to get. It could be as simple as a unit test that runs some scenarios a dozen times under a timer and fails if 90 percentile time is greater than the target time.
How do you measure this? Run the item multiple times in different envs and take an average?
While you can make this as complex as you want, even something as statistical abhorrent as simply measuring 1 run on a build server gives you a lot of bang for your buck.

The main idea is to have some way to keep the pulse and notice the worst regressions.

I usually like to set percentile thresholds: bad performance is much more likely to cause user frustration so I’d set something like p50/90/97th values for your supported scenarios (e.g. on a website, you might say “old phone on 3G”, “laptop on WiFi/cable”, and “desktop on fiber”) and treat blowing the top end budget as the most critical: fix in next release if you can’t meet your 97th percentile targets but otherwise every release should incrementally improve.

For network latency, you can set profiles in things like headless Chrome or use the Linux / macOS traffic shaping globally.

Percentile thresholds IMO is the way to go. You'll find things that don't cause problems at P50 to be problematic at higher percentiles, and the higher percentiles are usually where the user pain lies.