Hacker News new | ask | show | jobs
by samsquire 1431 days ago
In my experience people refactor code to their own understanding of the problem and not all refactorings improve the code.

People abstract before an abstraction is necessary.

I find single file dense leetcode style code easier to understand and follow the flow. Algorithmic code I can reason around. A large mature codebase is far harder to get to know.

One of the first things I do when I study a new codebase is find all the entry points and follow the flow of code from beginning to the thing I am interested in.

One person's beauty is another person's mess.

It's harder to change an existing codebase than to write a simple program that does the new thing but not in the context of the original program. A reference implementation of the various components is far easier to understand than one big ball of mud. Fitting problems together is hard. You need to understand the old thing before you can introduce the new thing and it ends up being forced or hacked in if the design doesn't support the new thing.

I tend to write reference implementations of everything, then combine them together as a separate project.

I find an empty file far more reassuring than a large codebase.

17 comments

Maybe I'm weird but a lot of my refactoring actually concretizes overly abstract code. It's easier to think about adding functionality to a block of code when you acknowledge that at the moment it only does 2 things, rather than using obscure wishy-washy language that implies it could do a dozen things.

Where I'm definitely weird is that I have a higher verbal score than your typical developer, and I'm not afraid to use a thesaurus to find a better word for something. Too often we end up recycling jargon in situations where they are not quite doing the same thing but nobody could be arsed to open thesaurus.com and find a word that telegraphs, "B is like A but is not actually A."

I think I'd like working with you.

Overly generic code is often pre-emptive, and most times the day never comes that you need that flexibility. And often when you do, you discover you need flexibility along a different axis anyway.

And most of what we do is story telling: what were the requirements we understood? What is our model to solve it? What are some precise examples that show it working in different capacities? When people treat the code as simply "the thing the computer interprets", instead of "the thing the next person has to comprehend", you get this inevitable slide into incomprehensible code.

Unfortunately our profession is obsessed with outdated approaches to (premature) performance optimization, and an addiction to being "clever".

I think I'd much rather surround myself with driven folks that have empathy and a strong desire to be understood. That's a long way from the programmer stereotype I'm familiar with.

> It's easier to think about adding functionality to a block of code when you acknowledge that at the moment it only does 2 things, rather than using obscure wishy-washy language that implies it could do a dozen things.

This is why Sum Types are so great. They give you an option between "concrete type" and "any type that implements this interface": "one of these specially enumerated types".

Yep, I do similarly.

More tightly bound code is often easier to understand and mechanically modify later - there are fewer places where you lose "if it compiles, it works" guarantees.

I feel like a lot of people are blindly pulling coding habits from libraries, and applying them everywhere. Libraries and applications (i.e. "terminal" products not used as a library by someone else) have different needs and different goals - don't write your application like a library, it'll be a huge pain.

I had to look up "telegraph" to confirm I'd understood you correctly. I don't recall seeing it used in the context of describing language, so I doubted my interpretation. I've heard it used most in discussions of boxing: a boxer's posture or their sequence of muscle activations _telegraph_ their planned attack such that their opponent has time to block or counter.

I like the way you used it. I'll try to use that myself.

It means communicating something quite clearly but by indirect means (and possibly inadvertently).
I've had this argument in code reviews a couple of times:

Them (on the subject of 4 new methods): hey why did you make method 3 look "weird"? You should make it look like the other three.

Me: because one of these methods can set the building on fire, and the rest can't. I bet you can guess which one is the dangerous one. Works as expected.

> People abstract before an abstraction is necessary.

This one really frustrates me. Write code to the complexity level needed to solve the problem, and nothing more. The only time I'd break from this is if I know for certain that the added complexity is going to be necessary in the near term.

> ... not all refactorings improve the code.

While true, I have a low tolerance for code that requires constant bug fixing, or is so overly complex that the thought of modifying it makes you want to cry. Some projects truly require that level of complexity. But in my experience many do not, and once you've gained a solid understanding of the problems it is trying to solve, incremental refactoring is a fantastic way to improve the code's stability and maintainability. This is especially true in C++.

> This one really frustrates me. Write code to the complexity level needed to solve the problem, and nothing more. The only time I'd break from this is if I know for certain that the added complexity is going to be necessary in the near term.

Mastery will be, when you write code in a way, that does not impose unwarranted limitations from the start, and still keep it readable and only containing mandatory complexity.

Usually this can be achieved through deep understanding of the problem, mapping to simple concepts or finding or making that one concept that captures things well.

Not always it can be done. Not always can a masterful solution be found, which keeps complexity low. However, it is definitely a mistake to draw a black and white picture of "if you want to make it work for the future, you must add complexity". Often people simply choose bad abstractions or wrong ones and will only realize, when the future has become the present and the system they built cannot fulfill some requirement.

> Often people simply choose bad abstractions or wrong ones

When writing software, ideally I'd like to make all the right choices and use simple implementations of abstractions that do not impose unwarranted limitations.

I think that it's sometimes worth it, early on, to do things the quick way despite bad abstractions. This can get you to a place where it's easier to reason about good abstractions.

Sadly, I've been on teams where a bad abstraction was adopted because it was just assumed that that would be quicker. Instead of doing it the quick way, we just did it the bad way.

We write abstractions to tame complexity. But abstractions themselves are inherently a form of complexity. A good abstraction may be simple to use, understand, and extend, but it can also make it harder to understand or debug issues because underlying data/state has been obscured. I agree that making anything "black and white" is a mistake. I merely said what I said because in my experience, most developers tend to abstract things that don't benefit anyone, and add unnecessary complexity. Besides, if you're writing an abstraction for a future hypothetical problem, chances are that you don't have enough information to create a good abstraction, and you're going to have to redo it anyway.
> Often people simply choose bad abstractions or wrong ones...

This is me. They tend to lead me to good abstractions (after merciless refactoring), and I'd like to think that I'm sucking less at this over time. But my overall process is very slow (good thing I'm self employed). Understood that it'd be better to stop and think instead of diving into new-abstraction boilerplate work.

Worse though is to be under heavy pressure to ship and move on — with the bad abstractions getting hopelessly calcified / buried.

>> People abstract before an abstraction is necessary.

> This one really frustrates me. Write code to the complexity level needed to solve the problem, and nothing more. The only time I'd break from this is if I know for certain that the added complexity is going to be necessary in the near term.

I worked with a guy who did that. He had a plan for what the project would look like 5 years down the road, and he built abstractions to support that. He could get away with it because he could hold it all in his head and it all made sense to him. When version 1 was half finished he was called away to work on another project, and those of us who followed in his wake struggled to make any sense of what he left behind. A year later he was laid off. The project was a success, but nobody ever asked for version 2.

>> People abstract before an abstraction is necessary.

> This one really frustrates me. Write code to the complexity level needed to solve the problem, and nothing more. The only time I'd break from this is if I know for certain that the added complexity is going to be necessary in the near term.

No, the ecosystem the code exists in and my ability to reason about the codebase is worth way more than any gain that comes from blindly stacking "simplest solution for problem a, b, .. z" atop one another without regard for higher level understanding of a codebase.

I was not implying a lack of understanding of the ecosystem/codebase or to blindly stack "simplest solutions" on top of each other. I was stating that one should not add more complexity than is necessary for the problem at hand. Often people solve a simple problem with a complex solution due to some hypothetical future problem they've conceived in their mind. Most of the time these hypothetical problems never materialize, and they are now stuck with code that is much more complex than it needed to be.
> One person's beauty is another person's mess.

This is so true. Also I believe that when the original author wrote the code he had a (hopefully) clear vision of the solution. He wrote it as tidy and fitting to the problem as he saw it. Then sometime later someone else comes in and is supposed to alter the code in a way which does not fit the original author’s idea of the problem. This creates a mismatch. The new guy can’t and won’t change the code too much, as it is too risky/much to do and therefore will only do as little as possible to make his change. Then some other comes along and do some more changes, which again isn’t enough etc etc, et voilà, you have a ball of mud that screams of a rewrite.

Spot on. You've described Peter Naur's "Programming as Theory Building" paper exactly.
I try to encourage newcomers to refractor the code into a form they understand, fix the problem and then undo that refactoring as much as possible. If they actually come up with a better abstraction I'm up for it.

Refactoring will give them the chance to see what the actually moving parts of code are.

I like your use of the words moving parts. Eventually code ends up looping over memory locations, copies, moves, adds, subtracts, multiplies, divides, reads, writes data or memory locations.

All the files and code on the way to get this to happen such as Classes, parameters, arguments, variables, functions, methods, closures, objects are ideas of the languages compiler to abstract the instruction stream.

Command line arguments, class constructors, URL query parameters, marshalling, JSON field names, method parameters, function arguments, HTTP headers, cookies, request objects, events are just complicated variations of passing data in the right shape. They are not the above list of "moving parts" or computation that is easy. In other words modern coding is just configuration.

I feel the complexity of modern code is a problem we created. And I feel there's something missing. It's hard to update code.

When I find the loop that does the thing, I feel I can understand the codebase such as the magic +1, -1 or the relationship of objects linked together in a data structure or the assignment to a list or array or variable.

"How does that get to here"

> I find single file dense leetcode style code easier to understand and follow the flow. Algorithmic code I can reason around. A large mature codebase is far harder to get to know.

I genuinely can't tell if you're being serious or not. If you are, do you also like to read books written as one giant chapter? Or entire chapters as one giant paragraph?

I have trouble with you equating "leetcode style" with "one giant chapter" and also with you equating "enterprise code" with one chapter following another, because when I read enterprise code it's

1. read one line of first chapter,

2. then skip to the last sentence of the middle chapter,

3. then realize the first chapter was actually the penultimate,

4. then read the forth sentence of the first paragraph of the second chapter, bearing in mind what you have learned,

5. then throw your hands up in the air in dispair

Jumping around is the nature of code in general, whether its in a 1000-line file or split up amongst multiple files.

For a maintenance programmer, they may already understand how everything works. They aren't following a particular code path, necessarily. Maybe they're working on a new feature and they need to re-familiarize themselves with previous chapters. It that case, it's nice to jump to a file that concerns itself with things grouped together.

>> things grouped nicely together

fixed it for ya :)

I'm a maintenance programmer. Even working layers of layers of layers above the actual shit does not make it not stink.

>> jumping around

...should be intuitive and joyful, not a disaster to your brain.

EDIT: I am a fan, though, of SOC. I guess enterprisey code tries to be that (but fails hard at it).

Okay, great. You are a maintenance programmer. You have a 1000-line program all in one file. You need to update the e-mail functionality of this program. You aren't following a stack trace or following a particular code path. There are 15 or 16 different e-mail related functions. How do you find the e-mail function you need to update? Do you memorize line numbers? Use regex search? Do you have vim marks setup?
>> here are 15 or 16 different e-mail related functions

Is this you showboating the greats of layered code?

Depth is not width and width is not depth, but surely you see the difference in the two?

If you have 15 or 15 different e-mail related functions spread across a whole bunch of files, how do you find the one you need to update?
At one point in time I used OpenGrok to try understand large projects.

Without documentation I find large projects difficult to understand. There's literally too many global symbols and I cannot see the forest for the trees.

What's the model of this program? What are the core principles that the author is using? Do I really need to read every file to understand what is going on?

With Leetcode style programs there's one file with everything in it and I can usually find the entry point. The problem is well defined.

I can see the moving parts in a Leetcode style problem. The looping, the data structure creation and control flow, arrays and recursion.

Large mature codebases such as Java projects have thousands to millions of small files and packages it can be difficult to see how things fit together, every file seems to be 10-20 lines long.

I like C projects as they have lots of code in one file. Everything I need to understand a module is in one file and I can use vim folding.

I’ve been programming for nearly 30 years and I feel the same way. 1000loc+ files are increasingly common in my code bases. I find it really hard to read code with lots of tiny files that individually don’t do anything. It’s like the programmer is embarrassed by their code so they’re making me search for the core logic.

Splitting code into multiple files really only makes sense to me when there’s a clear division of responsibilities. That can mean a lot of things - like client / server, utility methods / core algorithm or class A / class B. But plenty of complex data structures are much easier to read and understand all at once. For example, I have a rust rope library which implements a skip list of gap buffers. The skip list is one (big) file. The gap buffer is another file. Easy.

Same. Well, 1000 is an outlier but 300-600 feels right. I sometimes feel bad for doing it, because it's not what some other people might consider good code to look like. I occasionally have to do code review or help fix a bug in a the other kind of codebase and even the devs often don't seem to understand how those thousand 15-line files all fit together.
> Without documentation I find large projects difficult to understand. There's literally too many global symbols and I cannot see the forest for the trees.

Good abstractions let you see the shape if the forest.

Most "good code" or "simple code" is an inedible potluck of "simplest solution for the problem at the time" with some documentation.

Actual good code has intention revealing abstractions that communicate the essence of the problem and problem domain.

back when i inherited a legacy C codebase i found https://www.jgrasp.org/ pretty useful to explore large files.
I could see this working with vim folding. That's an interesting approach. I never really used folding in IDEs.
The book analogy you use isn't very accurate. Even if you merge chapters and paragraphs like that, you still read it sequentially. Just in a less comfortable way.

Which is not at all like a modern codebase that is modular, abstracted, etc. If you're new to a codebase, and want to understand one particular feature, you'd likely need to jump back and forth across 10 files.

It's not far-fetched to say that makes it difficult to understand.

The thing is that if you just need to understand a specific part of something you will need to jump as well even if everything you needed for that one thing is written sequentially in one file. You will want to skip over implentation details of certain things to get the general picture first on a more abstract level.

Let's say you have a simple endpoint that takes a list of comma separated inputs, parses them as numbers and spits back a sorted version of that.

I don't want to see a version of that, which a compiler might have inlined. Including the implementation of the sorting algorithm. I only want to see a high level of abstraction version of it. Basically just something like (pseudo code in a non existent language) :

    fun endpoint(input):
      inputs[] = split(input, ',')
      numbers[] = parseAsIntegers(inputs)
      return quicksort(numbers)
I can easily understand what this does and what the idea is behind this "algorithm" in 3 lines. If I had the "inlined" version of this I would have to manually identify each of these parts and potentially skip over tens to hundreds of lines.

I think this is really a bit about trust. Do you trust that these named functions I am calling do what their name does? Does quicksort actually do a quicksort or has someone implemented bubble sort in there? Of course this is a minimal example and especially quicksort would probavly just be a library but imagine all of these were large complex pieces of our code base.

Personally I am an advocate for using functions (methods or whatever your language calls them etc) and naming them properly and then trusting those names by default. I want to spend time making this nice and understandable and abstracted once when writing. Not every time someone reads it. As soon as something does not seem to behave in the way the name suggests I will then and only then go check the actual implementation and for example find out that parseAsIntegers actually also supports floats and quicksort is not actually quicksort but bubblesort and that is why this endpoint was slow etc.

This would be the “simple” code. The “abstract” code would be more like this:

  public class EndpointManager {
    private EndpointInputManager eim;
    private StringSplitter splitter;
    private NumberParser parser;
    private Sorter sorter;
    public EndpointManager(EndpointInput input) {
      eim = new EndpointInputManagerFactory().setInput(input).build();
      splitter = new StringSplitterFactory().setDelimiter(new Delimiter(",")).build();
      parser = new NumberParserFactory().setFormat(NumberParserFormat.INTEGER).setMode(NumberParserMode.LIST).build();
      sorter = new SorterFactory().setSortOrder(SortOrder.ASCENDING).setAlgorithm(SortingAlgorithm.QUICK_SORT).build();
    }
    public EndpointOutput endpoint() throws ParseException {
      splitter.split(eim.getInput());
      parser.parse(splitter.getList());
      sorter.sort(parser.getOutput());
      return new EndpointOutputFactory().setOutput(sorter.getSortedList()).build();
    }
  }
My guess is this is Java or something close? We can make that much more readable. We may have to do away with bad libraries. A lot more could actually be magicked away, which can also be a problem sometimes. In a "real" application this resource's interface would probably not just take a comma separated string in a body but accept a proper JSON object or somesuch and not just be a "sort endpoint" but it's not going to be much different from this if written properly. I happen to like the few annotations you'll see me use here. I also think that something as simple as a line break can unclog things. Also the choice of having each of the stream operations in a separate line is deliberate for readability. There are linter/auto formatting rules to enforce this (we do this at my current place for example).

    @Path("/sort")
    public class SortResource {

        @GET
        public List sort(@Body String input) {
            validate(input);
            return Arrays.stream(input.split(","))
                .map(Integer::valueOf)
                .sort()
                .collect(Collectors.toList());
        }
    }
I do recognize the kind of code you pasted. Had to work in code bases like that for way too long. Never want to work in one of those again. There's probably lots of EJBs and other such nonsense around that?
I was going to make the predictable joke that you need a factory. Disappointing that you took care of it already.
You can easily understand what it does because you picked an example that is easy to understand :)

This discussion often ends up in the extremes: inline everything versus abstract everything. I don't think anybody reasonable would opt for either of those extremes, we should focus on the very large middle ground where there's a lot of subjectivity.

Trust me on this, I've been raised on the DRY dogma and all related architectural patterns in favor of abstraction. I've lived the life, for 2 decades. But I cannot ignore the outcomes. Most codebases are extremely difficult to understand and it's very painful to change things. As 90% of all software development is maintenance, that's a planetary-sized problem.

This doesn't mean you should inline everything, it means sane choices. As a simple example, say you're using a literal in your code:

(if orderAmount > 100000)

This code is incredibly easy to read. Common convention says to put this in a constant, at the top of the file. Old me agrees, new me does not. For as long as it's the only occurrence of the value, it doesn't need abstraction. The only thing it would do is make the code more difficult to understand. The very eager abstracter might even put that constant in a separate file.

The point of this example is to abstract based on real reuse, not imagined reuse. I'm not against abstraction only against unnecessary abstraction.

A second example. Say you have a reusable UI component. A change request comes in that's pretty large and specific for one niche need. It kind of goes against the spirit of the initial purpose of the component but is still related enough to consider it in scope of the component.

Old me might add a "toggle" to the component, after which it can render in two modes. Sometimes called a "god component". This approach sucks. It makes the component much more complicated and changing and testing it becomes a nightmare.

Even older me would break down the component into smaller components and then "compose" them based on its mode. This is even worse, now you have to jump around many places whilst the sub components are never actually reused (pointless abstraction).

New me says fuck it and splits the component in two. Allowing significant code duplication between both components. It's not as radical as it sounds, it's in fact incredibly comforting. Each component can easily be understood (less complexity) and making changes becomes far less stressful as your blast radius is tiny.

Developers spent the vast majority of their time not coding, instead figuring out how something works and how to make a change that doesn't break anything.

It may not have been evident from my simple example but I do agree with your "middle" approach. That's where I try to end up in our code base. Endless interfaces, methods that are only one line long and such are counter productive. But nobody can tell me that inlining quicksort will ever be useful outside of a place where your compiler can't do it for you and you need to favour execution speed over everything else. I don't believe such places really exist much if at all any longer.

What I do have to have to question is the strict non-use of constants. It can be very very useful to use constants for such things, e.g. if you are calling libraries that do not make it apparent what is what. Say you have something that takes a timeout value.

    send(data, 10)
What is this? I have to know what send is, what parameters it takes etc. I might have to look that up. I can easily work around that with a constant.

    const timeoutInMillis = 10
    send(data, timeoutInMillis)
The same principle can easily apply for other similar situations. I really like it for things like

    doSomethingThatCouldTakeLongButAlsoShouldHaveATimeout(data, 64800000)
What is that and what does that value even mean in human readable? Of course some of these values you will recognize if used enough but so far my domains have been sparse enough that I don't recognize all of them and have to compute. Much better (with shorter, real names anyway but ya know, we're dealing in simple examples here :) ):

    REALLY_LONG_RUNNING_PROCESS_TIMEOUT_IN_MILLIS = 1000 * 60 * 60 * 18
    doSomethingThatCouldTakeLongButAlsoShouldHaveATimeout(data, REALLY_LONG_RUNNING_PROCESS_TIMEOUT_IN_MILLIS)
So far most people I've talked to find that it's much easier to recognize that this has a timeout of 18 hours but the method happens to want milliseconds.

Oh and don't get me started on people that use the constants from the real code in their tests, completely defeating the testing. Especially if they then do math with the constants and simply copy the math - or worse, put the math into a method and call it from the tests too - to their tests. Test expectations have to be computed once, when writing the test and just hardcoded into them, otherwise they serve no purpose as changing the code itself will always result in green tests even if you've just made a major mistake by changing the values without thinking.

I think he is, can second that. Not having to jump around 10 files with multiple classes and remembering where goes what usually means I can understand the code faster.

Not sure about literature but for almost any technical matter I prefer learning from the smaller details instead of the big picture - it's often too vague and just doesn't stick in my memory.

Reading enterprise code is like reading a book where 99% of the pages don't contain any meaningful information.
>I find single file dense leetcode style code easier to understand

I find it really difficult to go through huge chunks of iterative code. I need abstractions otherwise I can't get my head around it. I often wind up refactoring into manageable chunks (even in pseudocode/diagrams) just so I can understand stuff.

For reference, my cognitive abilities are heavily skewed towards verbal/abstract reasoning - like several standard deviations above the norm - and my spatial/concrete reasoning is nearly the inverse of this, it's terrible.

I wonder if this has something to do with it!

> find it really difficult to go through huge chunks of iterative code. I need abstractions otherwise I can't get my head around it. I often wind up refactoring into manageable chunks (even in pseudocode/diagrams) just so I can understand stuff.

Understanding lots of code at a module, function, or even more granular level with a magnifying glass feels more productive than struggling to understand the full picture.

It also rewards you with instant gratification. Reading and writing ncrete code gives much more immediate gratification.

One of my recent coding experiences was teach a friend in grad school(MA) to code sufficiently well to finish his Master's project.

Refactoring was absolutely necessary for this. He was writing a single simulation program that was single file in size. But once he'd created ten subroutines all changing a raft of the global variables, the slightest changes produced hair-raising bugs that he'd obsessively dive into debugging.

The intuitions of structured programming and object oriented are more important than absolute fidelity. My points were: "If you can't have an object here, at least have a well defined, standard interface to values that need to be in a consistent state" and "decompose long action sequences into subroutines and if you can't do that, least group similar actions with similar actions in that long action sequence".

Which is to say a given piece of code might not the structure you want but if has a structure, that can be enough. But then again, that piece of code might not structure at all and then rewriting it really is necessary and often is easier than debugging it a few time.

And working with a large piece of "bad" corporate code, I've more than once that you something with one sensible if idiosyncratic structure that was refactored more than once by people who didn't understand the structure and imposed their own structure on just part of the code. But through an exercise in archeology, one can make the whole artifact work.

But that doesn't mean you can't have code that is a true mess when the writer has no experience and no concern with structure.

> People abstract before an abstraction is necessary.

Sometimes an abstraction cuts to the core of the reason why.

See for example https://algebradriven.design/

Good abstractions can communicate intent better than mounds of concrete code because they speak at a higher level.

However, mounds of okay concrete code is way easier to deal with then poorly thought out abstractions.

This means pragmatists get little practice in abstractions, where their pragmatism is needed most to uncover the useful abstraction and avoid the overly complex invented abstraction.

Abstract code also has the advantage of parametricity in strongly typed programming languages.

I started designing an algebraic language by writing code.

https://GitHub.com/samsquire/algebralang

It's designed to be expressive and powerful and practical.

The core insight to a problem is rarely what we spend most of our programming time doing.

> The core insight to a problem is rarely what we spend most of our programming time doing.

I believe that is a mistake.

I'll have to check your language out though!

Are those code samples formatted correctly? (they are difficult to read as is)
> In my experience people refactor code to their own understanding of the problem and not all refactorings improve the code.

That's a great point. I once read (here on HN I think) that the value behind a piece of software is not the code but the team whose members all have the same mental model of the problem and can successfully map it to the code. Lose the team and you lose that map.

> One of the first things I do when I study a new codebase is find all the entry points

I follow the same strategy but… Good luck with that when you're facing a Spring application. :)

> I tend to write reference implementations of everything, then combine them together as a separate project.

This is how I write software. My stackblitz is full of domain independent experiments https://stackblitz.com/@Pyrolistical. I then copypasta this into private projects once I figured out how the individual piece works.

Yeah I love this approach too. I’ve been learning CRDTs lately and I’ve gotten so much value from making tiny, inefficient reference implementations of things before diving in and optimizing. Toy implementations shake out all your misunderstandings of your design, and you can refactor like crazy. Going from simple code that works to complex code that works is much easier than creating the complex code correctly from scratch, in situ.
Also I've discovered (and reported) so many bugs when I realized my very simple toy example was broken
Is it deliberate that your comment itself is like an instance of the methodology you describe? Small separate stand-alone thoughts vs. a well-understood description (harder to write, to consume) in longer form commentary?

That's not a criticism at all by the way, I just found it striking.

It was accident and not deliberate but you may reveal how I think. Thank you for this insight.

In hindsight if you focus on three questions: is it good? Is it right? Is it true? You'll head toward a good direction

Synthesis of ideas is really important and the building blocks of understanding are fascinating. Programming and mathematics is taught as building blocks and then deliberate practice.

I really enjoy reading plain descriptions of things, especially of other people's code.

If you understand the core insight, difficult things can be easier to understand and apply for you.

I really want to understand how tracing compilers work and LuaJIT, JVM and V8 but I found the code a bit too hard to understand as I jumped into the wrong locations.

There has been two instances where Wikipedia was enough for me to understand and write an algorithm that implemented the description. Wikipedia doesn't have pseudocode for multiversion concurrency control but it does have an accurate if subtle description. I did the same for btrees but I did read some other people's implementations to get a feel. I of course wrote mine completely differently.

I want people to document their code enough so that the core principles or idea behind their code could be reimplemented by someone else just by reading the description of how it works.

Rpython and Pypy documentation is good but I still don't understand it enough to implement what it does. Which means I'm missing some detail or core insight.

What a great way to do it, too, because once you're done, your "reference implementations" can later serve as test harnesses if need be.

> I tend to write reference implementations of everything, then combine them together as a separate project.

"In my experience people refactor code to their own understanding of the problem and not all refactorings improve the code."

Same for rewrites. Often the rewrite will have the same number of problems, just different ones.

If you are out there and WANT to write terrible code, An amazing essay: https://cs.fit.edu/~kgallagher/Schtick/How%20To%20Write%20Un... I cried the first time i read this.
Generally most of my refactoring is cutting out code; I love an excuse to chunk down excessively verbose classes.. maybe I'm weird but I generally prefer a lean, concise codebase that might share a few free functions than a perfect OOP pyramid.
No mention here is made of test coverage (hopefully because everyone already knows); to refactor without near-100% test coverage is insanity.
>>People abstract before an abstraction is necessary.

This!