Hacker News new | ask | show | jobs
by abelhabel 3422 days ago
I found this interesting.

"if a physical law can be as complicated as the experimental data that it explains, then there is always a law, and the notion of “law” becomes meaningless! Understanding is compression! A theory as complicated as the data it explains is NO theory"...

Then saying that program -> computing -> output has to follow from small to large in size, ie the program is the algorithmic information and the output is the data it explains.

Looking at modern day web development we are seeing the opposite, where the program is many times bigger than it's output, suggesting to me that the modern frameworks do not come from any sound theory.

8 comments

> Looking at modern day web development we are seeing the opposite, where the program is many times bigger than it's output

You're seriously underestimating the size of output in a web browser. Consider that the CSS, JS and HTML describe the position of every pixel and its motion on the screen.

Not just the size of a given output, but the space of possible outputs. I am also of the opinion that webdev is bloated and miserable, but it is still a massive compression of the information it represents.
I don't really buy that acres of whitespace and gradients really describe much information.

Considered at the boundary of your awareness--in that you are mostly unaware of all the pixels on the page, then absolutely not true. A typical news article might be a few KB of text but MBs and MBs of images that you never look at.

I'm talking about Shannon Information specifically. In which case, acres of whitespace and gradients very much are information. Calling them "whitespace" and "gradients" is a significant act of compression in and of itself.
I understand your argument, but even then I don't think it's quite right; there is lots of room to improve on the size of a webpage vs its rendered pixels on the screen, even with the fancy fonts and gradients and animations. I still don't believe it's really megabytes. Have you seen 4k demo contests, btw? https://www.youtube.com/watch?v=0w_xEUoK79o
Yes, I agree. I also realised it wasn't really accurate of me of comparing an applied theory to the theoretical theory, as the author calls it.
You are describing "generalization". Lookup Tables (LUT) can still be a valid technique if the problem space is small enough.

If you use a table for trigonometry functions then interpolation is generally used... So it doesn't have to be all-or-nothing.

Compression generally works by removing redundancy via some means or another according to some scheme... But compressed data can take up more space if it goes against the grain (i.e. RLE when the repeated byte is the same as escape byte). Randon number generators with simple seeds (not a hidden entropy pool) can be thought of as disrupting simple patterns in a sequence.

EDIT: just thought of this. http://mathworld.wolfram.com/Prime-GeneratingPolynomial.html

You can use a LUT, but that's not required to define sin and cos. You can define them perfectly fine as

sin(kπ) = 0 sin(kπ + ½π) = (-1)^k cos(kπ) = (-1)^k cos(kπ + ½π) = 0 dsin/dx = -cos dcos/dx = sin

Sine and cosine are the only smooth continuous functions that are solutions to that system of equations. There are also numerous geometric ways to define them. That's what's meant by "understanding". If all you had was the LUT and no other way to compute the value of sine, then you don't really understand what sine is.

Table lookup is a common solution when a FPU isn't available and speed is a higher priority than accuracy.

I've actually used formulas for trig functions before with IEEE double floating-point until the accuracy didn't improve.

(think it was a continued fraction or or some other technique I didn't understand, didn't -lm or something for builtin functions and worked around it lol...)

It was very slow! Makes your computer's fan spin faster, even changes the smell in the room while grinding away!

The browser is far more than a program that outputs just that one page. If one page is all we needed, we wouldn't need the program. We'd just need the page.

If we are considering all possible input as input, and all possible output as output, with our static browser in the middle, now the browser is tiny.

Basically, the output is the internet. That's huge.

But this is how theories work. The theory is the tiny thing in the middle. The data it compresses is for all valid instances as input and of corresponding output. If one calculation is all we needed, we wouldn't need theories, we would just need the answer.

The fact that I can display petabytes or a kilobyte of data with the same query and the same frontend markup just depending on the query string parameter leads me to believe you dont really know web dev.

EX: http://ilikelists.php?uid=2 , select * from users where user_id = 2; ... or http://ilikelists.php , select * from users;

> Looking at modern day web development we are seeing the opposite, where the program is many times bigger than it's output, suggesting to me that the modern frameworks do not come from any sound theory.

I tend to agree, most "modern [web] frameworks" were created as hacks to ease some other development pain and only later got polished by the application of some sound theory (or simply by enough development/use-cycles).

OTOH, since frameworks must be generic enough to, well, act as a "frame" for the actual application, one could also argue that more can be gained by some JIT-mechanism that optimizes according to the current usage pattern. (Somebody, IIRC at MIT, published something along those lines just a few weeks ago.)

I'm not sure how you get to the last assertion from the previous two statements. A lot of the bloat for modern web applications come from tracking / analytics.
His definition of "theory" is, shall we say, quite specific, though he tries to overextend it to all cases. Since data are particular and theories deal with generalities, it is impossible to take generalities and get back to particulars. Beyond some interesting mathematical results, I wouldn't take all of what Chaitin says too seriously.
When I read that statement I immediately thought of the case where some long and tedious program is written to calculate a number, or even just a single bit true/false answer to a statement which is full of arbitrary or "random" parameters (i.e. difficult to compress). I'm thinking of examples like the software Appel and Haken used in their original proof of the four colour theorem.

I think there are a couple of ways to think about this:

The algorithmic complexity Chaitin is discussing assumes "logical omniscience", i.e. when we read some logical statement/line of code (or add some bit to a program/theory) we immediately know all of the consequences. In such a setting, we never have to run a computer program to see what it produces: we know as soon as we've read the code. An compiler with such omniscience could optimise a long and tedious calculation by replacing it with the result, e.g. "4", and hence prevent programs which are larger than their output. (Note that this isn't actually possible in general, since we'd essentially be running the program at compile time, and we'd run into the halting problem, turning our compiler into a partial function).

In such cases the programs would actually be the same as their outputs, and hence don't offer any "explanation" (compression), as Chaitin puts it. Mathematicians might agree with this, since it's a common complaint that having an automated theorem prover tell us "true"/"false" isn't considered very useful, even if it's known to be correct (e.g. if we've verified the software).

This leads into the second perspective: that of "boolean blindness" ( https://existentialtype.wordpress.com/2011/03/15/boolean-bli... ). If we consider such programs from a Curry-Howard point of view, the output type corresponds to a theorem of which the program is a proof. If all we're outputting is a boolean value, then the corresponding theorem can be thought of as "there exists a boolean". If we think of our program as a proof of this statement, it's massively over-complicated; we could just write "true" or "false" (or "0"/"1", etc.), and that would be just as good a proof that a boolean exists.

The "blindness" is that trivial values like "true", "false", "0", "1", "2", etc. tell us nothing about their intended meaning: the language/representation makes no distinction between the value of "1 < 5" and the value of "solve(fermatsLastTheorem)".

We can change our program's type, such that it encodes the problem we're trying to solve, e.g. something like (in pseudo-Coq):

    forall (m : Map), exists(c : Colouring(m) | length(colours(c)) <= 4)
This forces our program to output a more complicated value than simple "true" or "false" (if this were Coq, we'd have to construct a function from Maps to pairs, where the first element of the pair is a colouring of the given map, and the second element of the pair is a Natural number equal to the difference between the number of colours used in that colouring and 4).

This brings us back into the realm of interesting, non-trivial comparisons between program length and output length. There's also an interesting distinction to be made between "proof objects" (programs which represent proofs of our theorem) and programs for constructing such proof objects. To be comparable with Chaitin, the language for constructing proofs should be turing-complete (e.g. like Coq's "Ltac"); yet we might want to avoid having our proof object's language be turing-complete, since that would permit unsound proofs.

I think it's this distinction between turing-complete and non-turing-complete languages that is the key to Chaitin's "explanations" (i.e. compression). An "elegant" (shortest) program in the proof-constructing language cannot be longer than an elegant proof-object, since we can just return that elegant proof object verbatim: the program will be as long as its output, and we have "explained" nothing, just like optimising a boolean program down to "true".

Yet an elegant proof-constructing program can be shorter than an elegant proof-object! We can't make the proof-object language total by rejecting only those programs which don't halt, since the halting problem is undecidable; our criteria for accepting/rejecting must necessarily allow too many or too few programs. If we allow too many, we're allowing invalid proofs, which we don't want. We must choose to allow too few, and hence some perfectly valid proofs will get rejected.

By having our proof-constructing language be turing-complete, we don't have to reject any valid proofs, and hence we can write proof-constructing programs which are not valid proof-objects. Some of those programs may be smaller than the smallest proof-object, and hence they "explain" part of the proof-object. For example, there may be some repetitive structure in the proof-object which the proof-object language does not allow us to optimise away. The proof-constructing language has no such restriction, allowing us to write a smaller program which iteratively builds up the proof-object.