Hacker News new | ask | show | jobs
by dep_b 3052 days ago
You need to get the programmers together and agree on a coding guideline, a linter and other architectural fundamentals.

Then the gardening phase begins, which means no refactoring or warning or style fixes unless you touch the code to do something meaningful like a feature or bug fix. Leave everything you touch better than before. Dissolve big classes and functions into smaller ones. Write a test or two for the smaller side effect free function you just created.

It takes a ton of time and patience but every week the kraken loses a tentacle. And once most code feels untangled you can embark on a refactor since all of the parts are movable now.

3 comments

This is absolutely the situation I am in right now, and I desperately need all the guidance I can get. We’re talking thousands of lines of React components that rely regularly on manipulating global js variables, mixed with MVC 5 written in Razor delivering variables in script tags, mixed with outdated jQuery, that’s all just concat’d together with gulp and that hasn’t seen a refactor in years of changing hands with no style guide. The company started in 2004 and I swear some of this code is still floating around from the day it started. We have three sets of scss breakpoints, three or four layers of “common.scss” files that act like a piece of furniture that has been painted over dozens of time (with shit like * {padding:5px} floating around), and an unknown amount of dead code. There’s lines written back-to-back with vastly different styles (with a random distribution of semi-colons and arrow functions and etc), and major global variables named with misspellings. And while I clean this all up, I am expected to continue pushing out features and addressing serious bugs with stop-gap measures.

Please advise.

Welcome to code development in the real world 101.

You either go insane, give up and take up beekeeping, start your own company or finally decide to not give a shit anymore and start hacking the same way as everybody else because in the end your are not sending astronauts to Mars, and it works "most of the time" is good enough for the clients and the price and time they are willing to pay.

> take up beekeeping

I thought the idea was to have less bugs to worry about...

Yeah. On the whole, it is less. ;)
Real world 303.

Work for a company and project sending astronaut to Mars, then you can care and be surrounded by coworkers who care too.

You need to tell your manager that it's simply impossible to both clean all that up and also make progress on the project. You need to stop the erosion in order to fix it; fixing will require fundamental changes that will absolutely block progress. If aren't able to freeze the project, your cleanup progress will be constantly thwarted by new ugly hacks.

There's just no way around it. If your managers won't freeze the project until you make meaningful progress, you're going to be stuck working with this ugly code for a long time. Try to convince them that in the long-term a cleanup will pay dividends. Without their support though, you won't get anywhere.

If you're able to freeze the code for a bit, it sounds like it might just be quicker to rewrite some parts like the CSS. Don't take on the whole project at once - find pieces that can be rewritten individually, even if it's a large part like all the styling.

I disagree with most of this comment, but I think it's an interesting and valuable discussion.

Here's where we agree: You need to stop the erosion in order to fix it. Every developer on the project needs a shared vision of what "better" means. For example, the parent comment mentions React components that manipulate global variables. First of all, everyone agrees to stop doing that, and agrees on the alternative (use props, a redux store, whatever).

But you don't stop the world and rewrite to the new standard. Instead, you apply a strong "boy scout rule" to every code change. If a developer modifies a small component, they also need to remove the global references. If the existing component is large, the developer must extract a smaller one, and the new extracted component must meet the new standard.

This will slow your velocity - at first, by a lot. But you never stop delivering entirely. You never waste time rewriting things that you would never touch again anyway. You also learn along the way - because while "don't use globals" is pretty universal, most architectural decisions are not. You will find that the "clean new standard" changes to match your application. At the same time, everyone on the team learns about the new standard, and buys in to the new way of doing things. There isn't a "new system" with the cool kids, while the B-team strings along the old one and waits to be laid off.

That's what I've always done. Fix it when you see it. Eventually, you have a clean codebase.
I've mixed feelings on this still. It leads to big code reviews, with most of the changes being less important. So I rather like the separation, spend some time just for cleanup. Maybe take notes on wins as you see them but you don't need to do them yet (or put them in the same review as another bug fix / story item).

If you're using git and can separate things out cleanly then I think fixing it as you go would work better. I yearn to use git every day again...

Well, I'm not talking about a major refactor (squash them when you see them). This should be an enum, this is named poorly, this variable is scoped wrong, inline sql when should be a sproc, that kinda stuff. A few lines a fix, the low hanging fruit.

Major refactors certainly need to be accounted for in a sprint(s), but keep features a priority when needed in sprints.

Unfortunately there are business reasons why freezing the code is impossible. We’re ecommerce and working on a timeline dictated by consumer behavior. So if we need a promo tomorrow it’s going live or else we’re not making budget. I kind of assume this is how we got into this state to begin with.
And unfortunately nothing will change because what you want to do is better in every single way, but their way of doing thing is better in the only one that actually counts: It's faster and cheaper.
My assumption is that I will never clean up this entire codebase. Rather, I’m looking for the best way to triage the situation: what are the things I can/should try to fix first?
Make your problem their problem.

Make a list and organize all the things you'd want to change. Untangling balls of mud often take on the order of months to start seeing any kind of progress. You have to make sure you keep things organized over a long time otherwise you can't tell whether or not you are making progress.

If a module breaks often when seemingly unrelated things change or breaks every time someone touches it it's easy to say the module is badly written. It's often hard to say why it's bad and harder to say how to fix it. However at least you will have data and be able to defend your position that messy code in a particular module actually slowed down the addition of a new business features.

If you are using a bug tracking tool (Jira, Bugzilla etc) make a bunch of private ones for yourself. If management doesn't like all your private tasks showing up on reports then try to get a section for yourself or see if you can deploy your own "private" bug tracking instance.

Any time you encounter a problem make a bug for it even if you have no solution. If upgrading a library version a minor version causes a break make a bug for it etc. You'll start to see patterns and then you can make small but really important surgery on the code. You can have a plan for how to fix a particular module when the time arises.

For example we had a core module with some 190 downstream internal clients across some 5 different dev teams. A recent Oracle version upgrade caused problems because we would have to to coordinate everyone to update at the same time. I already had a long time standing issue that the module is poorly organized and already had a list of all the things that were wrong with it. When this issue came up I immediately stepped in and how to completely remove Oracle from the core module so that the downstream clients can each upgrade on their own time. Because I had thought about this for time I was ready to defend the proposal by showing it would take less time to do this. I was able to do a refactoring I've been wanting to do. Over time people will start to listen to you more if you are actually delivering shit on time. Otherwise, even if your complaints are correct, you will be labelled as just a whiner.

Write down everything you know about the codebase. Whenever you discover a surprising dependency or behavior make a note of it. This is something you can do in passing as you push features.

At some point you'll start recognizing structures in the code some of which will be small enough to surgically improve.

How will you make sure that you are not introducing regressions in such code base? You cannot add unit tests because it is probably untestable mess. So instead of some uncertain dividends, which may or may not materialize, you introduce real risk of breaking already existing stuff.
For "light reading" on this subject, try _Working Effectively With Legacy Code_ by Michael Feathers. We used it to very good effect at my last company.
Michael Feathers was on Software Engineering Radio last year: http://www.se-radio.net/2017/06/se-radio-episode-295-michael...
Personally I thrive in this kind of environment, it's a kind of comfortable chaos. MVC5 is positively modern compared to many of the products I've worked with which have included (within the last 5 years) VB6 winforms but mostly aspx webforms.

Understand that previous developers might not know what a DOM is have likely written ad-hoc jquery snippets, included 3 different versions of jquery-ui until they happened across the right combinations of versions to get their plugins to play and will have a mentality of "it was working".

Given that, here's some advise on how to stay sane:

1. Read and "Digest Working Effectively with Legacy Code" but accept you'll likely not be empowered to actually implement the changes.

2. Understand you'll have a grace period of ~6 months where you'll be far more empowered to make changes and get suggestions pushed through, after which you'll likely find yourself worn down by the system and more accepting of the status-quo. Leverage this time if you can.

3. Try to identify what caused the product to get to the state it is so it can be avoided for future products. If it's something you're not empowered to change such as "bad hiring policy" try to work out how you can spot that in future interviews.

4. Be pro-active in looking for bugs, particularly security flaws. Finding an IDOR which lets you get cross-account data is usually fairly straightforward in these kinds of environments. Doing so gets you noticed so you're not just another cog in the system and can provide real value. Be careful not to put others' out too much while doing this. It sounds like you only work on the front-end but even what appears to be front-end can have security implications such as template-injection.

5. Draw a line in the sand. Be entire accepting in legacy code but if someone is working on a new part of the product or is fixing a bug in legacy code be absolutely ruthless in what you'll let passed in code review. This may not do you any favours socially but a reputation for harsh code reviews isn't all bad. For one thing you'll have to do fewer code reviews.

6. Don't bite off more than you can chew. Treat it like a giant knot and don't be tempted to pull at threads unless absolutely necessary. Instead wait until given strong reasons for refactoring.

Edit: formatting, because HN doesn't do markdown.

In regards to #5, worth pointing out that some coworkers are probably just as disappointed with the current state of affairs as you. They would probably really appreciate somebody pushing them to hone their craft.

Also worth considering that some people never learned how to handle constructive criticism and will take any suggestions as personal attacks. Stay as far away from these people as possible, you aren't going to change them but their pettiness can make your life miserable, particularly if they're closer to management than you are.

It should be noted that more often than not there are strong reasons besides ignorance and incompetence for a project to arrive at a certain state.

https://en.m.wikipedia.org/wiki/Wikipedia:Chesterton%27s_fen...

It shoud be stressed that a developer should not assume all his predecessors were wrong just because he doesn't understand why some stuff ended up being the way it is. The world is packed with developers who proud themselves of fixing things that were never broken, and in the process break stuff.

> if someone is working on a new part of the product or is fixing a bug in legacy code be absolutely ruthless in what you'll let passed in code review. This may not do you any favours socially but a reputation for harsh code reviews isn't all bad.

In my experience this is the most important thing. I particularly focus on the "what the heck does this code even do" aspect - the person who is modifying the code needs to be able to explain exactly what the code does and why it is correct (don't accept "it compiles and the one unit test we have passes"). Forcing the team to understand the code will make it very difficult for them to continue adding bad code to the pile, as they won't be able to explain what they bad code does, or will realize the code is bad in the course of trying to explain it.

It is very, very important in this situation, however, to not be outright mean. Do not let anything through that you would not be comfortable maintaining, but don't be a dick about it. All that will end up accomplishing is that you'll be blamed for things not going through, and then you'll lose your credibility as steward of the code. People will go over your head to the boss, who will just make you pass it.
Unfortunately 5. doesn't work if your company doesn't do any code reviews.
You should trust your engineering judgement, and your tech lead should listen to you when you raise these issues, and your product team should give your team engineering time to start cleaning things up. If any of this doesn't happen, get a different job.

As engineers we have a responsibility to raise concerns about bad practices and projects that have been neglected this way. Doctors raise concerns about hospitals, lawyers raise concerns about firms and enterprises, structural engineers raise concerns about buildings and infrastructure. Software engineers should do the same, resigning in protest if need be.

First agree on what it should actually be and document that. Make sure nobody commits anything that makes the codebase worse. Don’t refactor but clean first, and only while delivering value.
Rule 1: No new bullshit hacks make it in.

Rule 2: One bullshit hack must be removed per sprint.

Over time, you'll have better software.

IME linting picks up about 99% fluff (ooh look there's some whitespace on a blank line) and 1% potential issues highlighted (usually minor).

The reason people like to default to linting when "fixing code quality issues" is purely because it's 100% objective and measurable, not because it picks up the truly important issues. It doesn't.

Edit: I am aware that coders have OCD impulses which can be sated with a linter, but priority #1 when fixing spaghetti code is unraveling the spaghetti, not sating your need to see }s in a place that makes you feel better.

I'd argue that there's a significant mental overhead reading code when it doesn't follow the same conventions, and that enforcing a style guide is a matter of having no broken windows. But of course there's no point trying to make shit look good.
A linter and code conventions are a way to make code more consistent and easier to switch between. At times it becomes hard to see what code you wrote and what somebody else wrote, which is a good thing. If I open a file and everything "just feels wrong" but I can't change it because enforcing my personal preferences are just my personal preferences is just making things worse that really takes some valuable energy away I could use for actually improving things. But I can get used to a coding guideline that's not 100% my preference but a well enforced and consistent one. Knowing that if I fix indenting and naming the fix will not enrage someone else.
At times it becomes hard to see what code you wrote and what somebody else wrote, which is a good thing.

I suspect this is the big point of disagreement between lint-advocates and many of those who are dubious. I generally don’t see big wins from collectivising code, and on the who prefer to treat people I’m working with as individuals who can be interacted with 1:1 vs an amorphous “the team” writing “the codebase”.

Usually there’s a style that’s mandated by large authorities like C# formatting is mandated by Microsoft and JetBrains. Copying their style gets you there for 80%. The rest is whatever you agree on and usually not something that bothers me.

If you’re the guy that thinks having a 300 line function in an 8000 line class is just an individual way of expressing yourself you’ll have to fight me.

If you think real tabs is better than 4 spaces I shrug and let you have it your way.

The fact that you’ve named an IDE vendor rather emphasises that there is a substantial gray area. By no means everyone wants to use an IDE (let alone that specific one...)
We did that (rubocop). Lots of fluff, a dozen or three minor to medium issues, and a couple of downright critical ones that slipped through production cracks out of sheer luck. Not even counting the last ones, the holistic effect of having a code base normalized is unmeasurable (i.e == through the roof), because suddenly moving from one part of the code to another is consistent enough that it removes a mental barrier to seeing patterns and enabling refactoring.
IMO there are two features which linting tools need before I can start treating them as serious "necessities" on a code base rather than "nice to haves".

  lint [cosmetic | medium | serious]
- you must select one to run the tool, and the rule defaults must be sensible. No "100 character line length" in 'medium'. That's strictly cosmetic. Serious is for initialized and unused variables and the like - things which you might plausibly say "I'm not unhappy that broke the build because it could have indicated a bug".

  lint medium --against=branch
- the linting tool should be able to intelligently compare against a baseline branch and only warn you about issues on code you have actually touched.

I do not agree that the holistic effect of having a code base "normalized" is "through the roof". I think the cost is relatively high while the benefits are relatively low. Right now at work if I run a linter against my code there are about 10,000 "issues" - the top 50 of which are completely cosmetic and I'd be remiss to invest time in fixing them.

I'm not opposed to linters on principle and I do use them, I just think that there's almost always bigger fish to fry.

In this case, rubocop is indeed configurable: https://github.com/bbatsov/rubocop/blob/master/manual/config...
pylint has different levels you can select (convention, refactor, warning, error), not sure if it's a common feature.

For the second, see git-lint: https://github.com/sk-/git-lint

Regarding the linting levels, for eslint you can configure any rules. Just use different .eslintrc and you should be all set.

Regarding the changeset, I think you could roll that pretty easily with something like `git diff --name-only | xargs eslint`.

A lot of cosmetic rules can be fixed automatically (e.g. `eslint --fix`). In that case the cost of having them on is near-zero, so even though the benefits are pretty mild the ROI is still positive.
Yep, a tool that did all of this might be enough to convince me.
I agree, linting doesn't pick up much. But, at places I've worked where the codebase was kept linted, the humans did a better job of catching bugs in code review.

My guess is that, when you have a cosmetically messy code base, people start to develop a lower standard for what counts a understanding the code they're reading. Probably as a self-preservation mechanism. There's only so much time in a day.

Also: if your coworkers care enough about quality to add and maintain a linter, they're the kind of coworkers that care to do good code-reviews and maintain a high quality bar.
"IME linting picks up about 99% fluff (ooh look there's some whitespace on a blank line) and 1% potential issues highlighted (usually minor)."

What language are you using? Dynamic languages are fairly difficult to meaningfully lint because it's hard enough just to analyze a function for problems, let alone extend past that. Static languages are easier to write non-trivial linters for, because there's more information just sitting there in the source, waiting for something to grab it.

But even for dynamic languages, linters can still do things like enforce documentation standards, code testing/coverage standards, and some other basics that may sound trivial, but still add up to very useful things.

I find luacheck (a linter for Lua) invaluable for picking up typos, unused variables, variables shadowing other variables and unintentional global use (since Lua defaults to global by default). The types of checks seems mostly reasonable to me [1][2].

[1] http://luacheck.readthedocs.io/en/stable/warnings.html

[2] I'm not entirely happy with the formatting issues, but they can be easily disabled.

Edit: hit submit by mistake.

Depends. Some programmers have absolutely no hygiene wrt formatting. You'd want that sorted for readability.

But the bigger point is that you reduce merge conflicts since you won't have all sorts of unrelated formatting changes mixed in with the real changes.

> Some programmers have absolutely no hygiene wrt formatting.

I don't know how these kinds of people live with themselves or are tolerated by others. I hate it.

"The reason people like to default to linting when "fixing code quality issues" is purely because it's 100% objective and measurable, not because it picks up the truly important issues. It doesn't."

I think that ideally, linting and especially formatting help quality by taking the trivial issues off the table. If your code is currently "clean", and the tools reformat or flag the code that you just wrote, then hopefully those issues will never be committed into the repository, so human code review can be about substantive points.

Most linters allow you to disable whitespace linting. Good tools allow you to learn to use them better.
Great advice here. Agree on a process. Do it slowly.

I work with a lot of legacy code and have felt overwhelmed at times by all the different coding styles that have been used through the code base and not know when to refactor or simply clean up.

Over the last year or two I've been using a linter to help me with all this. Basically any code I touch gets cleaned to the point the linter isn't complaining. This is usually a fairly quick and fun process and I've actually learned a lot along the way.

I would refactor as soon as too many things were touching and the code started to have a smell about it even though the linter wasn't complaining.

It has made maintaining legacy code more fun (almost like a game) and given me some better practices when moving into new code.