Hacker News new | ask | show | jobs
by jrochkind1 1663 days ago
Yep. A bug is something that happens (although too many can be judged).

But I don't understand how github didn't already know about the regression, from automated testing or error monitoring. I have high expectations for github, because they have met them.

1 comments

A sibling comment (from an engineer at GH) has given some insight: https://github.com/github/feedback/discussions/8149#discussi...

They have error monitoring, but download URLs are one thing that's tricky to monitor for "errors" correctly because if two URLs 404, how do you know from your Grafana dashboard or whatever which are valid and which aren't? Having run a server whose only purpose is to serve static files myself -- you get a lot of 404s, all the time, it's no indication anything is wrong at all. This case would only be picked up by a dashboard if it fatalistically caused an error somewhere (like a 500) but by definition this wasn't ever a 500, it's a 404.

As you note, it's just a bug. Sometimes things you don't understand might actually surprise you, it turns out.

That makes sense.

This was reported like it affected all download URLs based on a git tag, which also means all download URLs appearing on github "release" pages.

If so, I'd have expected there would have been some testing that would have caught this too. Of course, sure, bugs in tests happen too.

Obviously bugs happen because bugs happen. I stand by being more disturbed that it took two days to notice and revert to restore the regression than I am by the fact that a regression happened. Users noticed right away and tried to report the problem, it took two days after that for Github to comment on it and to revert, which seems problematic, no? Especially if the bug was really affecting as many download URLs as I think it was reported; if it was only affecting a minority of edge case ones, that's more understandable.

It's never possible to eliminate regressions. (It may be possible to reduce the rate of them of course). But whether by testing or by receiving error reports from users, it ought to be possible to notice all major regressions in less than two days.

If this affected all links, couldn’t you just monitor it with a general “does downloads of these URLs work” check? You don’t have to (and shouldn’t) monitor 404 returns to test whether downloads are working, the naive and obvious test is to actually attempt to download something.
Naively, I imagine the rate of 404s or the percentage of 404s as compared to all responses could be helpful in such a service.