I can only recommend Feathers' Working Effectively With Legacy Code, which far expands on OP's themes, notably OP is missing tests - how can you improve on existing code when you don't even know that it works like it says it works?
In some cases it shows its age (especially when it comes to rolling your own mock testing, most languages have automated frameworks for that now), but it's still a great overview of the techniques you can employ.
All professional programmers should continually read Martin Fowler's "Refactoring", in the same way English professors read Shakespeare.
"Refactoring" does not mean "previous programmer was an idiot". Often, the understanding of the problem and the way the code is used change over time. Refactoring is a way of bringing the out-of-control weeds back into healthy symbiosis with the larger garden of code.
I stopped telling clients a long time ago that I'm "refactoring" and instead I just add some extra time to all tasks to account for refactoring on legacy code.
I know this code is just an example, but it seems like it's entirely backwards. The "bad" code at the beginning is immediately clear in what it's doing, but the refactored code is significantly longer and hard to understand.
This. The problem is that in some cases, the legacy system is really messed up and making it unit testable is a big refactoring task on it's own.
In these cases, I try to write some functional tests first (such as calling restful endpoints and checking the response, or using a headless browser). Not great but much better than nothing. Any ideas on how to do it better?
Feathers defines legacy code as code without accompanying tests in "Working Effectively with Legacy Code". Even to the point where someone he knows said of a particular shop, "they're writing legacy code, man!"
Actually, I typically surround legacy code with functional tests instead. Unit testing is only really useful on blocks of code that you can safely wall off from the rest of the code base. Big balls of mud by their very nature don't really have that.
Much of the necessary refactoring for legacy code involves decoupling, which inevitably means changing method signatures and even replacing entire methods. If you surrounded those methods with unit tests which will break even when the functionality doesn't, you've made the code more resistant to refactoring, not less.
I actually wrote this! I can't believe tests slipped my mind, that's usually the FIRST thing I do because I'm always scared of breaking anything. I'm going to edit this post and add a test.
Sandi Metz talk "All the Little Things"[0] is amazing for understanding the general concepts around refactoring OOP code. It's ruby centric, but the concepts are applicable to all OOP languages. The most interesting part is the fact that during the refactoring process you should actually increase the amount of code and complexity to eventually consolidate and aggregate it down. This as opposed to simply trying to delete/mutate code.
if( masterList[z].list2 != NULL && masterList[z].list2.length() > 0 )
{
for( Integer y = 0; y < masterList[z].list2.length(); y++ )
The if-statement is a good summary of what is wrong with Java. The author doesn't even notice that the second argument of && is redundant and keeps it in the "refactored" version as well...
He then proceeds to iterate over the list using zero based indexing. The loop would not execute once if the list had length zero, making this check just plain wrong.
I know an even lower level language (C++, C) that doesn't have the problem of things which make no sense to be NULL (list elements). The problem is that Java did away completely with value types and made everything pointer only. That has been recognized by later languages (C#) and fixed.
The whole code consists of problems: the first is the language, the second is the programmer, the third is the missing for-each construct.
My word, something can be redundant (not conceding that point) and not "just plain wrong."
Your language hate and insistence that the programmer himself is one of the "problems" is why people don't do code reviews, and why people get overly defensive if you offer constructive criticism of code. Your criticism is not constructive.
If you are in a position of power or mentorship I suggest you take a moment to think how your words and actions influence those around you, particularly those less experienced who may look up to you.
> My word, something can be redundant (not conceding that point) and not "just plain wrong."
I think we can agree to disagree. I'm strictly against redundancy if it doesn't serve a well-defined purpose.
> Your language hate and insistence that the programmer himself is one of the "problems" is why people don't do code reviews, and why people get overly defensive if you offer constructive criticism of code. Your criticism is not constructive.
I would phrase my criticism entirely different if the recipient was someone who asked for my commentary and not someone who felt confident enough to write a blogpost on how to start refactoring code.
> I'm strictly against redundancy if it doesn't serve a well-defined purpose.
Well, a good optimizing compiler might factor redundant checks out. The JVM has one of the best optimizing compilers around... Sometimes source code clarity is better than "absolute correctness", especially when we're discussing something trivial.
> I would phrase my criticism entirely different if the recipient was someone who asked for my commentary and not someone who felt confident enough to write a blogpost on how to start refactoring code.
Right, because we should seek out excuses to be nasty to others. How about we just try to be constructive as much as possible?
> I know an even lower level language (C++, C) that doesn't have the problem of things which make no sense to be NULL (list elements)
You must null check a lot of things in C, especially since there is no graceful error handling (try/catch blocks)... C certainly allows things to be null (or garbage) values.
> The problem is that Java did away completely with value types and made everything pointer only.
I'm not sure what you are saying here -- the very notion of pointers do not exist in Java. This decision was made while creating the language, and avoids an entire class of programming errors. Java is strictly pass by value.
> the third is the missing for-each construct.
I completely agree with you here. Java does have a for-each construct, and it's recommended to use whenever possible. It avoids an entire class of programming errors.
> I'm not sure what you are saying here -- the very notion of pointers do not exist in Java. This decision was made while creating the language, and avoids an entire class of programming errors. Java is strictly pass by value.
To clarify: the GP is probably referring to boxed and unboxed data types. IIRC, Java has some unboxed data types ("primitive" types?), but mostly everything is boxed behind a pointer.
> but mostly everything is boxed behind a pointer.
Behind a Reference would be more accurate. It's just a reference to a spot in the heap. Other than similarly "pointing" to a place in memory, the comparison between Java References and C Pointers stops there. One cannot pass a "pointer" in Java, nor can the pointer be free-form manipulated like in pointer-arithmetic.
> You must null check a lot of things in C, especially since there is no graceful error handling (try/catch blocks)... C certainly allows things to be null (or garbage) values.
You are correct that adding C as an example language was wrong. C++ on the other hand still stands.
> I'm not sure what you are saying here -- the very notion of pointers do not exist in Java
Java references are just C pointers without pointer arithmetic.
You can't always rely on for-loop mechanics to do branching.
Also, having an implied "conditional" that must be extracted by reading the head-portion of a loop is very very unreadable. I'd reject your submission on a code-review.
Instead, you should do something like below. I assume that your example is contrived, so for the sake of argument, assume I have all sorts of business cruft around mine. I.e. should_process exists because there are business rules for processing/not processing the master_list.
def should_process(master_list):
return master_list is not None and len(master_list) > 0
def main_doing_of_stuff_foo():
if not should_process(masterList):
return # Yes, this thing should be in its own method so you can do an early exit.
for item in masterList: # Yes, you should be using an iterator as well, not indexing.
print("Body goes here".)
I can't follow. The code I showed is already processing masterlist and decides if it should process masterlist[z].list2 for every z: 0..masterlist.length.
I fully agree that the code in my post is very bad, I took it from the article.
The main point is that you shouldn't rely on an implied rule that comes from the oddity of the length being 0 on a list.
If I'm looping through a list, it's because I want to loop through it and process its items. I'm going to return/skip earlier due to whatever reason, and not rely on the "logic" for processing being built into the length of the list, or whoever populates the list.
> The author doesn't even notice that the second argument of && is redundant
Doesn't it check if a list isn't null and then if the list has at least one item in it? Or are you simply saying the for loop takes care of the situation where there are 0 items in the list?
Along the same lines: it's considered good practice to perform a refactoring in a separate commit from code which changes behavior. This makes it much easier to review the code, as you can look at the changes in isolation. If you refactor and then change behavior in a single commit, it can result in a very confusing change.
In some cases it shows its age (especially when it comes to rolling your own mock testing, most languages have automated frameworks for that now), but it's still a great overview of the techniques you can employ.