Hacker News new | ask | show | jobs
by _jomo 2980 days ago
As pi is infinite, it also stores any imaginable information. Someone actually went ahead and implemented a file system based on this idea:

https://github.com/philipl/pifs

7 comments

> As pi is infinite, it also stores any imaginable information.

Whether that is true is related to the normalness [1] of the number pi, but this property has not been proven. See also: [2].

[1] https://en.wikipedia.org/wiki/Normal_number

[2] http://www.askamathematician.com/2009/11/since-pi-is-infinit...

That was a fun read. Ive always thought about this when people say 'THE UNIVERSE IS INFINITE SO WE ARE EATING DINNER AT HELMS DEEP SOMEWHERE'.

Naw man, physics still apply, you dont get gandalf just because you went a few million light years toward Andromedia

There is a great Numberphile video loosely related to this (no magic though)

"Googol and Googolplex - Numberphile" - https://www.youtube.com/watch?v=8GEebx72-qs

The difference is that Pi is apparently normal, up to all extensive existing evidence. The Universe is not normal. Anyway, Physics doesn't preclude Helms Deep.
I think normalness is a stronger property than "possesses all sequences of the alphabet of its digits", since it means that + "all subsequences are equally likely"
He claims 100% compression, but that clearly disregards the digit ID which is potentially quite large to store. Any idea as to how to compute stats for actual compression (size(content)/size(index))?
Assuming that pi is a normal number [1], its digits are uniformly distributed. Therefore the density of a given binary sequence of length N in pi’s binary expansion is 2^-N.

So on average, the offset for a given binary sequence will be on the order of 2^N (I’ve used some hand-waving here). You need N bits for that, you you’ve gained basically nothing.

[1] https://en.m.wikipedia.org/wiki/Normal_number

A friend and I tried to do this in highschool and realized it wasn't so useful except for a cool factor.
I think reading a bit farther it becomes clear that it's a parody

Edited to fixed terrible spelling

You mean “parody.” This near-homophone is quite confusing here!
Oops. My only excuse, weak as it is. Is that I typed that on my phone. Fixing it now thanks.
Sure it is! It's an interesting problem nonetheless :-)
Size of the index would just be log2(index), right?
Your calculation works backward from already knowing the index.

I assume jgtrosh's question is more of a probability and estimation question. E.g. the string "Hello world" takes up 11 bytes and what's the size of the index into pi? To generalize a little more, to store any 11-byte sequence, what is the estimated size to store its index? To find the exact 256^11 byte sequence in pi, there will be many cases where the sizeof(index) is greater than the data itself.

Yes but index would be ~2^(length of content)
> He claims 100% compression, but that clearly disregards the digit ID which is potentially quite large to store

That reminds me of a compression program I once wrote. It had the following properties:

1. It accepts any non-empty file other than a file consisting of a single 0 byte.

2. It is easily reversible.

3. The output is never larger than the input.

4. The output is smaller than the input for some files.

5. If a file is larger than 1 byte, iterating enough times will always eventually produce a smaller file. What is "enough times" depends on the file content.

(I wrote this partly as a joke, and partly as a counterexample to some proofs I'd seen on Usenet about the limitations of compression programs where the proofs had not been sufficiently precise in specifying the conditions necessary for them to apply).

It is actually quite simple:

1. If the file is empty or a single byte that is 0, report that the input is not acceptable and exit.

2. If the file is N bytes, treat it as if it is an 8N bit unsigned integer, I.

3. If I is 0, the output file is N-1 bytes all with value 0xFF.

4. If I is not 0, subtract 1, and output the result.

(Decompression is left as an exercise for the reader).

Another way to look at this is to imagine that we have a sorted list of all possible files. The primary sort key is file size. The secondary sort key is the numerical value of the file when treated as an 8N bit unsigned integer, where N is the file size.

My compression algorithm is simply to replace each file with the file immediately ahead of it on that file list.

Viewed that way, it is pretty obvious that all my claims are true. It accepts any non-empty file other than the first file on the list (single 0 byte). To reverse it, replace the input with the file immediately after it from the list. The list is ordered by file size, so stepping down in the list never gives a larger file, and sometimes gives a smaller file. The output file is on the list, so you can obviously iterate.

The catch, of course, is that while yes, you can take your 1 TB of arbitrary data (even true random data!) and apply my algorithm iteratively to get down to, say, a 1 byte file, and yes, you can apply the decompression algorithm iteratively to recover your 1 TB from, to do that you have to know how many times to iterate on the decompression end.

The number of iterations required for an uncompressed file of N bytes to compress down to a byte is on the order of 2^(8N), so you will need about N bytes to store the iteration count.

So net result is that it can replace an N byte file with a 1 byte file...but it is going to drop N bytes of metadata on you that you'll need to store somewhere.

That's not necessarily true and probably is false. What we know about Pi is that it includes an infinite number of elements, but this says nothing about the structure of those elements. An infinitely long random sequence will probably hold every pattern possible (I'm not a mathematician so correct me if I'm wrong) but any structured or repeating sequence doesn't.
> An infinitely long random sequence will probably hold every pattern possible (I'm not a mathematician so correct me if I'm wrong) but any structured or repeating sequence doesn't.

This is not true. Pi is probably irrational, which means we know it doesn't repeat itself. However, it is not proven to be normal, so it's a conjecture (but not fact) that it contains every finite sequence of digits at some point.

That's the worst possible typo for this conversation.

Pi is provably irrational, and probably normal!

> That's the worst possible typo for this conversation.

Damn autocorrect! And it's too late to edit it, too.

Yes, that should say "provably irrational".

Can pi be recursive then? Can it contain itself? I think not.
No. If pi would contain itself it would be a rational number. We have a proof that pi is irrational, hence it cannot contain itself.
Specifically, if pi contains itself, that means "pi - pi/b^k" is a terminating fraction N in base b", where b is your choice of base (like base 10), and k is some positive integer. So

pi = N/(1-b^k) which is rational.

(And if you set k=0, you get that pi "contains itself" in the sense that pi from the 1st digit onward is of course pi)

> Can it contain itself?

Well it does, trivially.

You’re probably thinking of set theory but the file system is about storing information in a sequence and an offset. Seen that way, π can be stored with offset 0.

If it's normal then it contains all finite strings uniformly in its base-b expansions. Pi is not a finite string so it wouldn't have to contain itself uniformly. More specifically if it contains itself then it's a rational number, but we know that pi is not a rational number. Note that we don't know whether pi is a normal real number.
To be fair, no other filesystem could contain pi either, right?
How do you even calculate pi with such accuracy?
Undortunately with this solution you’re still storing the same amount of information as you’d normally do. It’s just encoded as offsets into pi’s decimal expansion. You can’t cheat entropy.
This is brilliant, thank you!

edit: am I the only one who thinks that this whole pifs filesystem was meant as a joke?

Everyone knows it is a joke.