Refactoring: How do I even start? | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	Refactoring: How do I even start? (socalledprogrammer.com)
	29 points by thclark 4097 days ago

11 comments

a_bonobo 4097 days ago

I can only recommend Feathers' Working Effectively With Legacy Code, which far expands on OP's themes, notably OP is missing tests - how can you improve on existing code when you don't even know that it works like it says it works?

In some cases it shows its age (especially when it comes to rolling your own mock testing, most languages have automated frameworks for that now), but it's still a great overview of the techniques you can employ.

michaelvkpdx 4097 days ago

All professional programmers should continually read Martin Fowler's "Refactoring", in the same way English professors read Shakespeare.

"Refactoring" does not mean "previous programmer was an idiot". Often, the understanding of the problem and the way the code is used change over time. Refactoring is a way of bringing the out-of-control weeds back into healthy symbiosis with the larger garden of code.

I stopped telling clients a long time ago that I'm "refactoring" and instead I just add some extra time to all tasks to account for refactoring on legacy code.

zerohp 4097 days ago

I know this code is just an example, but it seems like it's entirely backwards. The "bad" code at the beginning is immediately clear in what it's doing, but the refactored code is significantly longer and hard to understand.

crdoconnor 4097 days ago

He missed step #1: write some tests that will provide feedback if you broke something, or assurance that you didn't.

gedrap 4097 days ago

This. The problem is that in some cases, the legacy system is really messed up and making it unit testable is a big refactoring task on it's own.

In these cases, I try to write some functional tests first (such as calling restful endpoints and checking the response, or using a headless browser). Not great but much better than nothing. Any ideas on how to do it better?

blt 4097 days ago

The closest thing I've seen to an actual definition of "legacy code" is "code that's very difficult to unit test." It's a shitty catch-22.

philbarr 4097 days ago

Feathers defines legacy code as code without accompanying tests in "Working Effectively with Legacy Code". Even to the point where someone he knows said of a particular shop, "they're writing legacy code, man!"

crdoconnor 4096 days ago

Actually, I typically surround legacy code with functional tests instead. Unit testing is only really useful on blocks of code that you can safely wall off from the rest of the code base. Big balls of mud by their very nature don't really have that.

Much of the necessary refactoring for legacy code involves decoupling, which inevitably means changing method signatures and even replacing entire methods. If you surrounded those methods with unit tests which will break even when the functionality doesn't, you've made the code more resistant to refactoring, not less.

squizzleflip 4096 days ago

I actually wrote this! I can't believe tests slipped my mind, that's usually the FIRST thing I do because I'm always scared of breaking anything. I'm going to edit this post and add a test.

mbesto 4097 days ago

Sandi Metz talk "All the Little Things"[0] is amazing for understanding the general concepts around refactoring OOP code. It's ruby centric, but the concepts are applicable to all OOP languages. The most interesting part is the fact that during the refactoring process you should actually increase the amount of code and complexity to eventually consolidate and aggregate it down. This as opposed to simply trying to delete/mutate code.

[0] - https://www.youtube.com/watch?v=8bZh5LMaSmE

pmr_ 4097 days ago

    if( masterList[z].list2 != NULL && masterList[z].list2.length() > 0 )
    {
        for( Integer y = 0; y < masterList[z].list2.length(); y++ )

The if-statement is a good summary of what is wrong with Java. The author doesn't even notice that the second argument of && is redundant and keeps it in the "refactored" version as well...

Alupis 4097 days ago

> The author doesn't even notice that the second argument of && is redundant and keeps it in the "refactored" version as well..

That's false. A list object may be initialized but be empty (making it's length zero). Or it may not be initialized (making it null).

Both conditions may happen independently of one another. He checks for the null first so that checking the length does not throw a NPE.

> The if-statement is a good summary of what is wrong with Java.

Frankly, I see nothing wrong here.

pmr_ 4097 days ago

He then proceeds to iterate over the list using zero based indexing. The loop would not execute once if the list had length zero, making this check just plain wrong.

I know an even lower level language (C++, C) that doesn't have the problem of things which make no sense to be NULL (list elements). The problem is that Java did away completely with value types and made everything pointer only. That has been recognized by later languages (C#) and fixed.

The whole code consists of problems: the first is the language, the second is the programmer, the third is the missing for-each construct.

pc86 4097 days ago

My word, something can be redundant (not conceding that point) and not "just plain wrong."

Your language hate and insistence that the programmer himself is one of the "problems" is why people don't do code reviews, and why people get overly defensive if you offer constructive criticism of code. Your criticism is not constructive.

If you are in a position of power or mentorship I suggest you take a moment to think how your words and actions influence those around you, particularly those less experienced who may look up to you.

pmr_ 4097 days ago

> My word, something can be redundant (not conceding that point) and not "just plain wrong."

I think we can agree to disagree. I'm strictly against redundancy if it doesn't serve a well-defined purpose.

> Your language hate and insistence that the programmer himself is one of the "problems" is why people don't do code reviews, and why people get overly defensive if you offer constructive criticism of code. Your criticism is not constructive.

I would phrase my criticism entirely different if the recipient was someone who asked for my commentary and not someone who felt confident enough to write a blogpost on how to start refactoring code.

Alupis 4097 days ago

> I'm strictly against redundancy if it doesn't serve a well-defined purpose.

Well, a good optimizing compiler might factor redundant checks out. The JVM has one of the best optimizing compilers around... Sometimes source code clarity is better than "absolute correctness", especially when we're discussing something trivial.

burntsushi 4096 days ago

> I would phrase my criticism entirely different if the recipient was someone who asked for my commentary and not someone who felt confident enough to write a blogpost on how to start refactoring code.

Right, because we should seek out excuses to be nasty to others. How about we just try to be constructive as much as possible?

Alupis 4097 days ago

> I know an even lower level language (C++, C) that doesn't have the problem of things which make no sense to be NULL (list elements)

You must null check a lot of things in C, especially since there is no graceful error handling (try/catch blocks)... C certainly allows things to be null (or garbage) values.

> The problem is that Java did away completely with value types and made everything pointer only.

I'm not sure what you are saying here -- the very notion of pointers do not exist in Java. This decision was made while creating the language, and avoids an entire class of programming errors. Java is strictly pass by value.

> the third is the missing for-each construct.

I completely agree with you here. Java does have a for-each construct, and it's recommended to use whenever possible. It avoids an entire class of programming errors.

burntsushi 4096 days ago

> I'm not sure what you are saying here -- the very notion of pointers do not exist in Java. This decision was made while creating the language, and avoids an entire class of programming errors. Java is strictly pass by value.

To clarify: the GP is probably referring to boxed and unboxed data types. IIRC, Java has some unboxed data types ("primitive" types?), but mostly everything is boxed behind a pointer.

Alupis 4096 days ago

> but mostly everything is boxed behind a pointer.

Behind a Reference would be more accurate. It's just a reference to a spot in the heap. Other than similarly "pointing" to a place in memory, the comparison between Java References and C Pointers stops there. One cannot pass a "pointer" in Java, nor can the pointer be free-form manipulated like in pointer-arithmetic.

pmr_ 4097 days ago

> You must null check a lot of things in C, especially since there is no graceful error handling (try/catch blocks)... C certainly allows things to be null (or garbage) values.

You are correct that adding C as an example language was wrong. C++ on the other hand still stands.

> I'm not sure what you are saying here -- the very notion of pointers do not exist in Java

Java references are just C pointers without pointer arithmetic.

burntsushi 4097 days ago

The problem you highlighted is just redundant case analysis. That's not a Java specific problem.

philbarr 4097 days ago

No, you don't need the the second argument since the for loop just won't execute at all if the list length is zero.

zo1 4097 days ago

You can't always rely on for-loop mechanics to do branching.

Also, having an implied "conditional" that must be extracted by reading the head-portion of a loop is very very unreadable. I'd reject your submission on a code-review.

Instead, you should do something like below. I assume that your example is contrived, so for the sake of argument, assume I have all sorts of business cruft around mine. I.e. should_process exists because there are business rules for processing/not processing the master_list.

    def should_process(master_list):
        return master_list is not None and len(master_list) > 0

    def main_doing_of_stuff_foo():
        if not should_process(masterList):
            return  # Yes, this thing should be in its own method so you can do an early exit.

        for item in masterList:  # Yes, you should be using an iterator as well, not indexing.
            print("Body goes here".)

pmr_ 4097 days ago

I can't follow. The code I showed is already processing masterlist and decides if it should process masterlist[z].list2 for every z: 0..masterlist.length.

I fully agree that the code in my post is very bad, I took it from the article.

zo1 4097 days ago

The main point is that you shouldn't rely on an implied rule that comes from the oddity of the length being 0 on a list.

If I'm looping through a list, it's because I want to loop through it and process its items. I'm going to return/skip earlier due to whatever reason, and not rely on the "logic" for processing being built into the length of the list, or whoever populates the list.

BinaryIdiot 4097 days ago

> The author doesn't even notice that the second argument of && is redundant

Doesn't it check if a list isn't null and then if the list has at least one item in it? Or are you simply saying the for loop takes care of the situation where there are 0 items in the list?

philbarr 4097 days ago

I actually don't think he should remove the second argument until the entire thing has been verified in tests. What if it was actually:

  masterList[a].list2.length()

..but he misread it as

  masterList[z].list2.length()

and so removed it..?

pmr_ 4097 days ago

You are saying that you actually need to read and understand code you are refactoring. Hard to disagree.

limeyx 4095 days ago

#1 If there are existing test cases - Run them - make sure they pass - make sure it looks like they have good coverage

#2 if there are no good test cases, write them and go to #1

pnathan 4097 days ago

Worth noting: some shops will summarily reject on code review any refactor that isn't part of the ticket/issue.

QuercusMax 4097 days ago

Along the same lines: it's considered good practice to perform a refactoring in a separate commit from code which changes behavior. This makes it much easier to review the code, as you can look at the changes in isolation. If you refactor and then change behavior in a single commit, it can result in a very confusing change.

pnathan 4097 days ago

Some shops reject commits that are refactoring. "Deliver value, don't just goof off". :-)

QuercusMax 4097 days ago

Sounds like they'd be awful places to work.

g8gggu89 4097 days ago

Why does he keep using Integer instead of int? And why use Object everywhere instead of generics?

OJFord 4097 days ago

I'm so frustrated by the minimum width, that I'm commenting instead of reading.

TeMPOraL 4097 days ago

<<So-Called "Programmer">> is the name of the blog; the title of submitted post is "Refactoring: How do I even start?". Could mods change this please?

pandatigox 4097 days ago

That's a good point. I clicked on it thinking it was one of those depressing rants about programming. Misleading and disappointed OP!

sctb 4097 days ago

Thanks, we updated the title.