Hacker News new | ask | show | jobs
Deleting 50k Lines of Code in 3 Days (aakashns.com)
33 points by aakashns 919 days ago
14 comments

> While I knew that some pages were less frequently visited than others, I was surprised to see that there were modules that accounted for less than 0.1% of page visits. This meant I could remove them entirely without affecting 99.9% of users. I could delete entire directories containing dozens of files and thousands of lines of code.

I don't know about the author's application, but a data driven approach is not going to give good results. In my podcast hosting software, the "delete podcast" button is used maybe once every other week (by the numbers). But if I got rid of it, I'd be fielding support requests for every user who wants to delete a show. And that's a highly manual cleanup process to do by hand.

Which is to say, features don't need to be a high percentage of traffic to be important.

Ye this is a terrible way to do refactoring. And I mean, "page visits"? In theory all users could be affected by the removed features.

The 0.1% and 99.9% are not complements!

"Data driven" development is so much BS since so many devs don't care to think properly about sampling and statistics.

It makes me wonder how many people hit the checkout flow on some websites, I'm sure the numbers for some people (automakers maybe) aren't too dissimilar.
We do have a checkout flow, and yes, that page is visited by a very small number of users. However, you don't actually think I deleted the checkout flow, do you?
No of course not, my point is that sometimes pages, code, and flows that are used by a minority of users are vital to business operations.
This is the line that I came to the comments looking for.

It's possible that the author did take this into consideration and glossed over the details, but it's not something that I as a reader take for granted. And it's not something that I as a coworker would take someone's word for, especially if they didn't list at least some examples of the things being removed.

Not trying to pile on the author. Trimming the fat on a codebase is great, and for sure a worthwhile exercise that not enough organizations pay mind to.

Our situation is a bit unique, because we're currently a two person company. I'm familiar with the product, our users, and the codebase, which allowed me to move quickly. This probably wouldn't work at a larger organization.
Also, 0.1% of users seems small, but looking at the actual number of users this represents is more important.

> web application that serves hundreds of thousands of requests every day

So in this case, that's at least hundreds of requests per day. I mean, it's not small.

I agree completely with this comment, and another comment saying that "which percent contains the value of the product", because that might be the most important 0.1%.

I didn't say I deleted everything that had less than 0.1% of page visits. :)

For instance, I didn't delete the "Account Settings" page. Features don't need a high percentage of traffic to be important, but less important features generally account for a very low percentage of traffic.

It's not a criticism of you, and I'm giving you the benefit of the doubt! But I think there's plenty of folks who might take away the wrong lesson here.
If you could somehow track the frequency spectrum of usage, then you could mostly handle that problem. "Oh, the frequency spectrum strongly peaks at about once a week? I guess users regularly need to use this feature. Probably important somehow." vs "Huh, the frequency spectrum isn't strongly peaked anywhere, but is substantially lower at high frequencies. We can probably remove this."
Imagine a time keeping system where employees regularly enter their hours and schedule time off. But somewhere is a button or screen that gets used once a week called "run payroll". Aside from being periodic and not available to everyone, no metrics can capture how important it is.

Never the less, my favorite thing to do with code is running over it with the delete key. Just be careful about what gets it ;-)

that's a great example!

And this is why the users are the ultimate deciders - or at least should be - what features ought to be removed, rather than the developers.

I think the parent means that the mean time between podcast deletions is two weeks, not that they are spread exactly two weeks apart. If the deletions are random but on average every two weeks, then your frequency spectrum will show nothing meaningful either.
> modules that accounted for less than 0.1% of page visits. This meant I could remove them entirely without affecting 99.9% of users.

No it doesn't, these are different metrics, same user that does those 99.9% visits could once in a blue moon want to visit a very important page, and be negatively affected. And this could (in theory) be the case for every single user

The Word screenshot is another illustration of the misplaced nature of this criticism: you wouldn't enable all of those toolbars in reality to obscure everything! If you wanted something, the beauty is you could just drag a needed button from a toolbar to a more condensed version

Or enable a toolbar for when you needed it, and then disable it

Isn't the standard corollary to that "80% of features not being used" statement that it's a different 80% for every user?
Seems to be from Spolsky here : https://www.joelonsoftware.com/2006/12/09/simplicity/

"A lot of software developers are seduced by the old ‘80/20’ rule. It seems to make a lot of sense: 80% of the people use 20% of the features. So you convince yourself that you only need to implement 20% of the features, and you can still sell 80% as many copies."

“Unfortunately, it’s never the same 20%. Everybody uses a different set of features."

Yeah, I'm thinking about the apps I use all the time and the features that I rarely use but would miss.

Like reset password... or account cancellation ... stuff like that.

We have a linux user self-serve page where we can reset our passwords when they expire. The catch is we never use said password because all logins are done with keys but you can't login if your password is expired. So imagine someone "data-driven" getting rid of that page because it is rarely used.

Why do the passwords expire if they're not used? "best practice"

Agree with both you and above that low usage is not a good signal of unimportance, but “reset password” is actually used _very_ frequently across the spectrum of average joe users. Here’s an example source: https://www.statista.com/statistics/1303484/frequency-of-pas...
I'm torn; lots of angst about 'but somebody might one-day need that feature!'

Isn't this exactly why, feature creep? It's not a fair argument, no more fair than 'it's only used by 0.1% of pageviews'. Neither is the full story.

If there's another way to accomplish the same thing, remove the more complex one. If the feature is part of a process that can be done another way, remove it. If the feature is used by an actual 0.1% of users, the impact of removing it is small.

Anyway, I had a friend in the bad old days, had a bulletin board (a bank of phones connected to modems that connected to a bank of computers) that hosted around 300 games. Folks would log in, play a game or two, log out.

He checked; only 10 games ever got played, pretty much. So he started removing most of the rest.

Callers declined 80% in the first week. Disaster; they paid by the minute.

See, folks were browsing his games, the most on any bulletin board! That's why they came to him, to see all that.

Then, sure, they'd play the same popular games nearly every time. But they had to see the other ones there to feel like it was the right place to go.

Sometimes, it's not about the feature being used. It's about the user's confidence they won't get stuck (I can always back out this change! Oh! The backout button is gone?! Panic), or feel the product is supported adequately, or even, it's a checkbox on a purchase requirements list.

Remove features at your peril!

Reminds me of the Joel on Software article talking about the 80/20 myth and why it’s a bad idea to take this path.

https://www.joelonsoftware.com/2001/03/23/strategy-letter-iv...

> A lot of software developers are seduced by the old “80/20” rule. It seems to make a lot of sense: 80% of the people use 20% of the features. So you convince yourself that you only need to implement 20% of the features, and you can still sell 80% as many copies.

> Unfortunately, it’s never the same 20%. Everybody uses a different set of features. In the last 10 years I have probably heard of dozens of companies who, determined not to learn from each other, tried to release “lite” word processors that only implement 20% of the features. This story is as old as the PC. Most of the time, what happens is that they give their program to a journalist to review, and the journalist reviews it by writing their review using the new word processor, and then the journalist tries to find the “word count” feature which they need because most journalists have precise word count requirements, and it’s not there, because it’s in the “80% that nobody uses,” and the journalist ends up writing a story that attempts to claim simultaneously that lite programs are good, bloat is bad, and I can’t use this damn thing ’cause it won’t count my words. If I had a dollar for every time this has happened I would be very happy.

Just because a function is”seldom used” shouldn’t schedule it for execution.

Users might seldom use something but really need it.

It's also gaslighting ... I'm also imagining a whole bunch of users trying to find some rarely used feature that they remember being there, isn't in the menu anymore and questioning their sanity.
> isn't in the menu anymore and questioning their sanity

Pretty much every OS update and new A/B tested web page out there...

OS updates are very different, I know when I upgraded my phone/computer.

A/B testing is really close to the same as this ... except one of A or B may stay in the product.

> It's also gaslighting

You mean it's an abusive relationship between the developer and the users?

He means it makes people question their sanity because they could have sworn the feature was right there.
Thanks, never saw that term used in this way.
gaslighting is deliberately doing things to make someone question their memory, sanity or perception of reality. It's a technique often used in abusive relationships as a method of control but a lot of people now use it to mean "lying to you".

This may not exactly be gaslighting, since they may not have the intent of making you question your memory but ...

Like their brain? /s
I accidentally imported a 3rd party library twice. Deleted the extra. Got a nice badge for deleting over 500k lines of code.
Is it a common practice to commit dependencies into the project repo? If so what type of projects do that?
It’s not unusual to do it in C or C++, just because the dependency management story sucks so bad (there’s a lot more options than there used to be, but still no one defacto standard like pip or cargo).
There's this concept of a shared library, you should check it out.
How does compiling your dependency differently help with dependency management?
That's the neat part: you don't

You do not compile your dependencies

It is well known that committing dependencies is a bad thing. See sibling posts. It's worth considering what ignoring that best practice gets you.

For example, you've been handed a bug. The customer is important and is running a version of your code from three years ago. You have source control, so you check it out and try to build it. What stuff might you expect?

1/ It uses docker and the image isn't online any more. I've had this one.

2/ One library dependency you used to use has been deleted from the internet. Also had this.

3/ Another dependency is still available, but it uses a dependency which isn't. Not yet.

4/ You managed to gather all the code and it refuses to compile with a modern toolchain

5/ As above, but this time the modern toolchain makes a different program to last time

6/ Another dep has dubious ideas of semver and the current copy doesn't behave like the old

7/ Actually anything using semver is considered deeply suspicious in itself

That's off the top of my head. I think there's probably a long list of variants on the source tree isn't sufficient information to recreate old versions. The reliance on old compiler bugs feels particularly realistic to me, but then C++ people mostly check in our dependencies. I've definitely checked out npm projects from a few months earlier and discovered they don't run any more.

Compare to the silly, paranoid, I've-checked-in-gcc-and-linux alternative. You check out code from N revisions ago and it all builds and runs, exactly like it used to, provided you can find hardware which looks adequately the same as it used to. I've heard rumours of warehouses of new-in-box sun workstations waiting for their time to replace the current ones too.

On balance, I reckon the industry best practice of grabbing whatever code some server gives back with an associated version number is a nonsense and obsessively committing the entire dev and run state into source control is the right thing. But I'm clearly in a minority.

The type of projects where you want to be able to run them in 20 years.
I think you meant run away from them, and immediately.
It depends. You may want to protect against the dependency disappearing from a public repository, or being changed by a malicious actor, or your internal repo is faster to clone and build, or... I'm just saying there are very valid reasons to vendor a dependency. There are also drawbacks: some folks vendor and then make small modifications... that's forking, good luck keeping it up to date. You also have more work to do to vendor new versions but that's easily automated.
I think projects in C would be a good contender here.
Common? Perhaps in less than ideal or legacy situations. Best practice, definitely not.

A lot of website projects I have worked on in the past included composer or node dependencies in the repository. It really slows down the whole git system.

This was at a big tech codebase. Everything is imported into the mono repo so it can be consumed by the Almighty Buck build system.

Python, C, Cpp.. it all goes in!

Someone somewhere is cursing that the entire reason he/she is using this product is now gone.
I think folks should go easier on this developer... The revert commit is always available, and sometimes products do gain unnecessary baggage that just slows down future product development. Kudos for trying.
Haha, thanks! Yes, as I've said in the post, the idea is the migrate the codebase to a new stack quickly, and add back some features.
Can someone else smell BS? 50K deletions in 3 days? It will take atleast a month just reading 50K lines, let alone understanding what they do and figuring out how deletion will affect the system.
Seems like this is an author of the product. They are likely very familiar. I’m pretty sure I can delete whole modules of my code base and know what is going to be affected.
I've worked on the codebase since day one. We structured it carefully so that modules didn't depend on one another. All shared components were in a separate folder. The structure of the web application lends itself naturally to such code organization.
Reading the story and looking at Jovian being a platform for working with python notebooks I can relate to authors use case. And this also doesn't appear to some random hack brought to decimate the code base - Ita someone who has worked on this stuff since 2019 so should be aware of the importance of things deleted.

Their approach is quite reasonable as a starting point. Make non breaking product change and see what happens. You may be amazed what (doesn't) happen.

Not sure what kind of company they work for, but if it is a legitimate business with real customers then higher ups should be horrified that a developer is randomly deleting in-use production features under the guise of "code cleanup". Whether it is accessed by 50% or 0.5% of users is irrelevant. That is not a measure of how important something is in the codebase or the product.

We have a data export form on our website (which is a requirement under GDPR and several other similar regulations) that has been used by under 0.01% of users throughout its existence. If a new hire decided to "clean it up" without approval and then bragged about how many lines of code they were able to reduce...

Brings to mind this story from the development of the Lisa at Apple in the early 80s: https://www.folklore.org/StoryView.py?story=Negative_2000_Li...
A part of my normal implement-a-feature routine is the "normalisation" phase where I go over the rest of the application, and make sure that other areas of the application use new idioms.
Huh, this is exactly how Evernote ported their application. Removed a bunch of features that paying customers only used "sometimes".

After a couple of years the feature set has mostly returned.

I cancelled my subscription and mostly moved to other apps. I still utterly despise everybody at Evernote.

And they focused on fancy notebooks and pens instead of what had been an amazing and market leading piece of software.

It still hurts a little every time I think about it. I really liked that product.