Hacker News new | ask | show | jobs
by bastawhiz 922 days ago
> While I knew that some pages were less frequently visited than others, I was surprised to see that there were modules that accounted for less than 0.1% of page visits. This meant I could remove them entirely without affecting 99.9% of users. I could delete entire directories containing dozens of files and thousands of lines of code.

I don't know about the author's application, but a data driven approach is not going to give good results. In my podcast hosting software, the "delete podcast" button is used maybe once every other week (by the numbers). But if I got rid of it, I'd be fielding support requests for every user who wants to delete a show. And that's a highly manual cleanup process to do by hand.

Which is to say, features don't need to be a high percentage of traffic to be important.

5 comments

Ye this is a terrible way to do refactoring. And I mean, "page visits"? In theory all users could be affected by the removed features.

The 0.1% and 99.9% are not complements!

"Data driven" development is so much BS since so many devs don't care to think properly about sampling and statistics.

It makes me wonder how many people hit the checkout flow on some websites, I'm sure the numbers for some people (automakers maybe) aren't too dissimilar.
We do have a checkout flow, and yes, that page is visited by a very small number of users. However, you don't actually think I deleted the checkout flow, do you?
No of course not, my point is that sometimes pages, code, and flows that are used by a minority of users are vital to business operations.
This is the line that I came to the comments looking for.

It's possible that the author did take this into consideration and glossed over the details, but it's not something that I as a reader take for granted. And it's not something that I as a coworker would take someone's word for, especially if they didn't list at least some examples of the things being removed.

Not trying to pile on the author. Trimming the fat on a codebase is great, and for sure a worthwhile exercise that not enough organizations pay mind to.

Our situation is a bit unique, because we're currently a two person company. I'm familiar with the product, our users, and the codebase, which allowed me to move quickly. This probably wouldn't work at a larger organization.
Also, 0.1% of users seems small, but looking at the actual number of users this represents is more important.

> web application that serves hundreds of thousands of requests every day

So in this case, that's at least hundreds of requests per day. I mean, it's not small.

I agree completely with this comment, and another comment saying that "which percent contains the value of the product", because that might be the most important 0.1%.

I didn't say I deleted everything that had less than 0.1% of page visits. :)

For instance, I didn't delete the "Account Settings" page. Features don't need a high percentage of traffic to be important, but less important features generally account for a very low percentage of traffic.

It's not a criticism of you, and I'm giving you the benefit of the doubt! But I think there's plenty of folks who might take away the wrong lesson here.
If you could somehow track the frequency spectrum of usage, then you could mostly handle that problem. "Oh, the frequency spectrum strongly peaks at about once a week? I guess users regularly need to use this feature. Probably important somehow." vs "Huh, the frequency spectrum isn't strongly peaked anywhere, but is substantially lower at high frequencies. We can probably remove this."
Imagine a time keeping system where employees regularly enter their hours and schedule time off. But somewhere is a button or screen that gets used once a week called "run payroll". Aside from being periodic and not available to everyone, no metrics can capture how important it is.

Never the less, my favorite thing to do with code is running over it with the delete key. Just be careful about what gets it ;-)

that's a great example!

And this is why the users are the ultimate deciders - or at least should be - what features ought to be removed, rather than the developers.

I think the parent means that the mean time between podcast deletions is two weeks, not that they are spread exactly two weeks apart. If the deletions are random but on average every two weeks, then your frequency spectrum will show nothing meaningful either.