Hacker News new | ask | show | jobs
by jstimpfle 2441 days ago
I'm still trying to sort this out here, not sure if I will manage, so take my apologies if it's a little confused.

I agree with this idea to a degree. However, there are limits to this "relation". Imagine a sophisticated software that compresses program sources. Let's say it operates on the AST level, and not on the byte level, since the former is a little bit closer to capturing software complexity, as you mentioned somewhere else. Now, I know that that's almost the definition of a LISP program, but maybe we can agree that 1) macros can easily become hard to understand as their size grows 2) there are many rituals programmers have in code that shouldn't get abstracted out of the local code flow (i.e. compressed) because they give the necessary context to aid the programmer's understanding, and the programmer would never be able (I assume) to mechanically apply the transformations in their head if there are literally hundreds of these macros, most of them weird, subtle, and/or unintuitive. Think how gzip for example finds many surprising ways to cut out a few bytes by collapsing multiple completely unrelated things that only share a few characters.

In other words, I think we should abstract things that are intuitively understandable to the programmer. Let's call this property "to have meaning". What carries meaning varies from one programmer to the next, but I'm sure for most it's not "gzip compressions".

One important measure to come up with a useful measure of "meaning" is likelihood of change. If two pieces of code that could be folded by a compressor are likely to change and diverge into distinct pieces of code, that is a hint that they carry different meanings, i.e. they are not really the same to the programmer. How do we decide if they are likely to diverge? There is a simple test, "could we write any of these pieces in a way that makes it very distinct from the other piece, and the program would still make sense?". As an aspiring programmer trying to apply the rule of DRY (don't repeat yourself) at some point I noticed that this measure is the best way to decide whether two superficially identical pieces of code should be folded.

I noticed that this approach defines a good compressor that doesn't require large parts of the source to be recompressed as soon as one unimportant detail changes. Folding meaning in this sense, and folding only that, leads to maintainable software.

A little further on this line we can see that as we strip away meaning as a means of distinction, we can compress programs more. The same can be done with performance concerns (instead of "meaning"). If you start by ignoring runtime efficiency, you will end up writing a program that is shorter since its parts have fewer distinctive features, so they can be better compressed. And if the compressed form is what the programmer wrote down, the program will be almost impossible to optimize after the fact, because large parts essentially have to be uncompressed first.

One last thought that I have about this is that maybe you have a genetic, bottom-up approach to software, and I've taken a top-down standpoint.

1 comments

> Imagine a sophisticated software that compresses program sources...

Isn't that the definition of a compiler?

https://en.m.wikipedia.org/wiki/Partial_evaluation#Futamura_... are an interesting point of view for the definition of a compiler