> While I knew that some pages were less frequently visited than others, I was surprised to see that there were modules that accounted for less than 0.1% of page visits. This meant I could remove them entirely without affecting 99.9% of users. I could delete entire directories containing dozens of files and thousands of lines of code.
I don't know about the author's application, but a data driven approach is not going to give good results. In my podcast hosting software, the "delete podcast" button is used maybe once every other week (by the numbers). But if I got rid of it, I'd be fielding support requests for every user who wants to delete a show. And that's a highly manual cleanup process to do by hand.
Which is to say, features don't need to be a high percentage of traffic to be important.
It makes me wonder how many people hit the checkout flow on some websites, I'm sure the numbers for some people (automakers maybe) aren't too dissimilar.
We do have a checkout flow, and yes, that page is visited by a very small number of users. However, you don't actually think I deleted the checkout flow, do you?
This is the line that I came to the comments looking for.
It's possible that the author did take this into consideration and glossed over the details, but it's not something that I as a reader take for granted. And it's not something that I as a coworker would take someone's word for, especially if they didn't list at least some examples of the things being removed.
Not trying to pile on the author. Trimming the fat on a codebase is great, and for sure a worthwhile exercise that not enough organizations pay mind to.
Our situation is a bit unique, because we're currently a two person company. I'm familiar with the product, our users, and the codebase, which allowed me to move quickly. This probably wouldn't work at a larger organization.
Also, 0.1% of users seems small, but looking at the actual number of users this represents is more important.
> web application that serves hundreds of thousands of requests every day
So in this case, that's at least hundreds of requests per day. I mean, it's not small.
I agree completely with this comment, and another comment saying that "which percent contains the value of the product", because that might be the most important 0.1%.
I didn't say I deleted everything that had less than 0.1% of page visits. :)
For instance, I didn't delete the "Account Settings" page. Features don't need a high percentage of traffic to be important, but less important features generally account for a very low percentage of traffic.
It's not a criticism of you, and I'm giving you the benefit of the doubt! But I think there's plenty of folks who might take away the wrong lesson here.
If you could somehow track the frequency spectrum of usage, then you could mostly handle that problem. "Oh, the frequency spectrum strongly peaks at about once a week? I guess users regularly need to use this feature. Probably important somehow." vs "Huh, the frequency spectrum isn't strongly peaked anywhere, but is substantially lower at high frequencies. We can probably remove this."
Imagine a time keeping system where employees regularly enter their hours and schedule time off. But somewhere is a button or screen that gets used once a week called "run payroll". Aside from being periodic and not available to everyone, no metrics can capture how important it is.
Never the less, my favorite thing to do with code is running over it with the delete key. Just be careful about what gets it ;-)
I think the parent means that the mean time between podcast deletions is two weeks, not that they are spread exactly two weeks apart. If the deletions are random but on average every two weeks, then your frequency spectrum will show nothing meaningful either.
> modules that accounted for less than 0.1% of page visits. This meant I could remove them entirely without affecting 99.9% of users.
No it doesn't, these are different metrics, same user that does those 99.9% visits could once in a blue moon want to visit a very important page, and be negatively affected. And this could (in theory) be the case for every single user
The Word screenshot is another illustration of the misplaced nature of this criticism: you wouldn't enable all of those toolbars in reality to obscure everything! If you wanted something, the beauty is you could just drag a needed button from a toolbar to a more condensed version
Or enable a toolbar for when you needed it, and then disable it
"A lot of software developers are seduced by the old ‘80/20’ rule. It seems to make a lot of sense: 80% of the people use 20% of the features. So you convince yourself that you only need to implement 20% of the features, and you can still sell 80% as many copies."
“Unfortunately, it’s never the same 20%. Everybody uses a different set of features."
We have a linux user self-serve page where we can reset our passwords when they expire. The catch is we never use said password because all logins are done with keys but you can't login if your password is expired. So imagine someone "data-driven" getting rid of that page because it is rarely used.
Why do the passwords expire if they're not used? "best practice"
Agree with both you and above that low usage is not a good signal of unimportance, but “reset password” is actually used _very_ frequently across the spectrum of average joe users. Here’s an example source: https://www.statista.com/statistics/1303484/frequency-of-pas...
I'm torn; lots of angst about 'but somebody might one-day need that feature!'
Isn't this exactly why, feature creep? It's not a fair argument, no more fair than 'it's only used by 0.1% of pageviews'. Neither is the full story.
If there's another way to accomplish the same thing, remove the more complex one. If the feature is part of a process that can be done another way, remove it. If the feature is used by an actual 0.1% of users, the impact of removing it is small.
Anyway, I had a friend in the bad old days, had a bulletin board (a bank of phones connected to modems that connected to a bank of computers) that hosted around 300 games. Folks would log in, play a game or two, log out.
He checked; only 10 games ever got played, pretty much. So he started removing most of the rest.
Callers declined 80% in the first week. Disaster; they paid by the minute.
See, folks were browsing his games, the most on any bulletin board! That's why they came to him, to see all that.
Then, sure, they'd play the same popular games nearly every time. But they had to see the other ones there to feel like it was the right place to go.
Sometimes, it's not about the feature being used. It's about the user's confidence they won't get stuck (I can always back out this change! Oh! The backout button is gone?! Panic), or feel the product is supported adequately, or even, it's a checkbox on a purchase requirements list.
> A lot of software developers are seduced by the old “80/20” rule. It seems to make a lot of sense: 80% of the people use 20% of the features. So you convince yourself that you only need to implement 20% of the features, and you can still sell 80% as many copies.
> Unfortunately, it’s never the same 20%. Everybody uses a different set of features. In the last 10 years I have probably heard of dozens of companies who, determined not to learn from each other, tried to release “lite” word processors that only implement 20% of the features. This story is as old as the PC. Most of the time, what happens is that they give their program to a journalist to review, and the journalist reviews it by writing their review using the new word processor, and then the journalist tries to find the “word count” feature which they need because most journalists have precise word count requirements, and it’s not there, because it’s in the “80% that nobody uses,” and the journalist ends up writing a story that attempts to claim simultaneously that lite programs are good, bloat is bad, and I can’t use this damn thing ’cause it won’t count my words. If I had a dollar for every time this has happened I would be very happy.
It's also gaslighting ... I'm also imagining a whole bunch of users trying to find some rarely used feature that they remember being there, isn't in the menu anymore and questioning their sanity.
gaslighting is deliberately doing things to make someone question their memory, sanity or perception of reality. It's a technique often used in abusive relationships as a method of control but a lot of people now use it to mean "lying to you".
This may not exactly be gaslighting, since they may not have the intent of making you question your memory but ...
It’s not unusual to do it in C or C++, just because the dependency management story sucks so bad (there’s a lot more options than there used to be, but still no one defacto standard like pip or cargo).
It is well known that committing dependencies is a bad thing. See sibling posts. It's worth considering what ignoring that best practice gets you.
For example, you've been handed a bug. The customer is important and is running a version of your code from three years ago. You have source control, so you check it out and try to build it. What stuff might you expect?
1/ It uses docker and the image isn't online any more. I've had this one.
2/ One library dependency you used to use has been deleted from the internet. Also had this.
3/ Another dependency is still available, but it uses a dependency which isn't. Not yet.
4/ You managed to gather all the code and it refuses to compile with a modern toolchain
5/ As above, but this time the modern toolchain makes a different program to last time
6/ Another dep has dubious ideas of semver and the current copy doesn't behave like the old
7/ Actually anything using semver is considered deeply suspicious in itself
That's off the top of my head. I think there's probably a long list of variants on the source tree isn't sufficient information to recreate old versions. The reliance on old compiler bugs feels particularly realistic to me, but then C++ people mostly check in our dependencies. I've definitely checked out npm projects from a few months earlier and discovered they don't run any more.
Compare to the silly, paranoid, I've-checked-in-gcc-and-linux alternative. You check out code from N revisions ago and it all builds and runs, exactly like it used to, provided you can find hardware which looks adequately the same as it used to. I've heard rumours of warehouses of new-in-box sun workstations waiting for their time to replace the current ones too.
On balance, I reckon the industry best practice of grabbing whatever code some server gives back with an associated version number is a nonsense and obsessively committing the entire dev and run state into source control is the right thing. But I'm clearly in a minority.
It depends. You may want to protect against the dependency disappearing from a public repository, or being changed by a malicious actor, or your internal repo is faster to clone and build, or... I'm just saying there are very valid reasons to vendor a dependency. There are also drawbacks: some folks vendor and then make small modifications... that's forking, good luck keeping it up to date. You also have more work to do to vendor new versions but that's easily automated.
Common? Perhaps in less than ideal or legacy situations. Best practice, definitely not.
A lot of website projects I have worked on in the past included composer or node dependencies in the repository. It really slows down the whole git system.
I think folks should go easier on this developer... The revert commit is always available, and sometimes products do gain unnecessary baggage that just slows down future product development. Kudos for trying.
Can someone else smell BS? 50K deletions in 3 days? It will take atleast a month just reading 50K lines, let alone understanding what they do and figuring out how deletion will affect the system.
Seems like this is an author of the product. They are likely very familiar. I’m pretty sure I can delete whole modules of my code base and know what is going to be affected.
I've worked on the codebase since day one. We structured it carefully so that modules didn't depend on one another. All shared components were in a separate folder. The structure of the web application lends itself naturally to such code organization.
Reading the story and looking at Jovian being a platform for working with python notebooks I can relate to authors use case. And this also doesn't appear to some random hack brought to decimate the code base - Ita someone who has worked on this stuff since 2019 so should be aware of the importance of things deleted.
Their approach is quite reasonable as a starting point. Make non breaking product change and see what happens. You may be amazed what (doesn't) happen.
Not sure what kind of company they work for, but if it is a legitimate business with real customers then higher ups should be horrified that a developer is randomly deleting in-use production features under the guise of "code cleanup". Whether it is accessed by 50% or 0.5% of users is irrelevant. That is not a measure of how important something is in the codebase or the product.
We have a data export form on our website (which is a requirement under GDPR and several other similar regulations) that has been used by under 0.01% of users throughout its existence. If a new hire decided to "clean it up" without approval and then bragged about how many lines of code they were able to reduce...
A part of my normal implement-a-feature routine is the "normalisation" phase where I go over the rest of the application, and make sure that other areas of the application use new idioms.
I don't know about the author's application, but a data driven approach is not going to give good results. In my podcast hosting software, the "delete podcast" button is used maybe once every other week (by the numbers). But if I got rid of it, I'd be fielding support requests for every user who wants to delete a show. And that's a highly manual cleanup process to do by hand.
Which is to say, features don't need to be a high percentage of traffic to be important.