| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by _jomo 2980 days ago
	As pi is infinite, it also stores any imaginable information. Someone actually went ahead and implemented a file system based on this idea: https://github.com/philipl/pifs

7 comments

amelius 2980 days ago

> As pi is infinite, it also stores any imaginable information.

Whether that is true is related to the normalness [1] of the number pi, but this property has not been proven. See also: [2].

[1] https://en.wikipedia.org/wiki/Normal_number

[2] http://www.askamathematician.com/2009/11/since-pi-is-infinit...

link

mkirklions 2980 days ago

That was a fun read. Ive always thought about this when people say 'THE UNIVERSE IS INFINITE SO WE ARE EATING DINNER AT HELMS DEEP SOMEWHERE'.

Naw man, physics still apply, you dont get gandalf just because you went a few million light years toward Andromedia

link

humblebee 2980 days ago

There is a great Numberphile video loosely related to this (no magic though)

"Googol and Googolplex - Numberphile" - https://www.youtube.com/watch?v=8GEebx72-qs

link

gowld 2980 days ago

The difference is that Pi is apparently normal, up to all extensive existing evidence. The Universe is not normal. Anyway, Physics doesn't preclude Helms Deep.

link

andrepd 2980 days ago

I think normalness is a stronger property than "possesses all sequences of the alphabet of its digits", since it means that + "all subsequences are equally likely"

link

jgtrosh 2980 days ago

He claims 100% compression, but that clearly disregards the digit ID which is potentially quite large to store. Any idea as to how to compute stats for actual compression (size(content)/size(index))?

link

tzahola 2980 days ago

Assuming that pi is a normal number [1], its digits are uniformly distributed. Therefore the density of a given binary sequence of length N in pi’s binary expansion is 2^-N.

So on average, the offset for a given binary sequence will be on the order of 2^N (I’ve used some hand-waving here). You need N bits for that, you you’ve gained basically nothing.

[1] https://en.m.wikipedia.org/wiki/Normal_number

link

jimktrains2 2980 days ago

A friend and I tried to do this in highschool and realized it wasn't so useful except for a cool factor.

link

space_fountain 2980 days ago

I think reading a bit farther it becomes clear that it's a parody

Edited to fixed terrible spelling

link

mikeash 2980 days ago

You mean “parody.” This near-homophone is quite confusing here!

link

space_fountain 2980 days ago

Oops. My only excuse, weak as it is. Is that I typed that on my phone. Fixing it now thanks.

link

jgtrosh 2980 days ago

Sure it is! It's an interesting problem nonetheless :-)

link

TazeTSchnitzel 2980 days ago

Size of the index would just be log2(index), right?

link

jasode 2980 days ago

Your calculation works backward from already knowing the index.

I assume jgtrosh's question is more of a probability and estimation question. E.g. the string "Hello world" takes up 11 bytes and what's the size of the index into pi? To generalize a little more, to store any 11-byte sequence, what is the estimated size to store its index? To find the exact 256^11 byte sequence in pi, there will be many cases where the sizeof(index) is greater than the data itself.

link

Jabbles 2980 days ago

Yes but index would be ~2^(length of content)

link

tzs 2980 days ago

> He claims 100% compression, but that clearly disregards the digit ID which is potentially quite large to store

That reminds me of a compression program I once wrote. It had the following properties:

1. It accepts any non-empty file other than a file consisting of a single 0 byte.

2. It is easily reversible.

3. The output is never larger than the input.

4. The output is smaller than the input for some files.

5. If a file is larger than 1 byte, iterating enough times will always eventually produce a smaller file. What is "enough times" depends on the file content.

(I wrote this partly as a joke, and partly as a counterexample to some proofs I'd seen on Usenet about the limitations of compression programs where the proofs had not been sufficiently precise in specifying the conditions necessary for them to apply).

It is actually quite simple:

1. If the file is empty or a single byte that is 0, report that the input is not acceptable and exit.

2. If the file is N bytes, treat it as if it is an 8N bit unsigned integer, I.

3. If I is 0, the output file is N-1 bytes all with value 0xFF.

4. If I is not 0, subtract 1, and output the result.

(Decompression is left as an exercise for the reader).

Another way to look at this is to imagine that we have a sorted list of all possible files. The primary sort key is file size. The secondary sort key is the numerical value of the file when treated as an 8N bit unsigned integer, where N is the file size.

My compression algorithm is simply to replace each file with the file immediately ahead of it on that file list.

Viewed that way, it is pretty obvious that all my claims are true. It accepts any non-empty file other than the first file on the list (single 0 byte). To reverse it, replace the input with the file immediately after it from the list. The list is ordered by file size, so stepping down in the list never gives a larger file, and sometimes gives a smaller file. The output file is on the list, so you can obviously iterate.

The catch, of course, is that while yes, you can take your 1 TB of arbitrary data (even true random data!) and apply my algorithm iteratively to get down to, say, a 1 byte file, and yes, you can apply the decompression algorithm iteratively to recover your 1 TB from, to do that you have to know how many times to iterate on the decompression end.

The number of iterations required for an uncompressed file of N bytes to compress down to a byte is on the order of 2^(8N), so you will need about N bytes to store the iteration count.

So net result is that it can replace an N byte file with a 1 byte file...but it is going to drop N bytes of metadata on you that you'll need to store somewhere.

link

John_KZ 2980 days ago

That's not necessarily true and probably is false. What we know about Pi is that it includes an infinite number of elements, but this says nothing about the structure of those elements. An infinitely long random sequence will probably hold every pattern possible (I'm not a mathematician so correct me if I'm wrong) but any structured or repeating sequence doesn't.

link

chimeracoder 2980 days ago

> An infinitely long random sequence will probably hold every pattern possible (I'm not a mathematician so correct me if I'm wrong) but any structured or repeating sequence doesn't.

This is not true. Pi is probably irrational, which means we know it doesn't repeat itself. However, it is not proven to be normal, so it's a conjecture (but not fact) that it contains every finite sequence of digits at some point.

link

gowld 2980 days ago

That's the worst possible typo for this conversation.

Pi is provably irrational, and probably normal!

link

chimeracoder 2980 days ago

> That's the worst possible typo for this conversation.

Damn autocorrect! And it's too late to edit it, too.

Yes, that should say "provably irrational".

link

hsnewman 2980 days ago

Can pi be recursive then? Can it contain itself? I think not.

link

avian 2980 days ago

No. If pi would contain itself it would be a rational number. We have a proof that pi is irrational, hence it cannot contain itself.

link

gowld 2980 days ago

Specifically, if pi contains itself, that means "pi - pi/b^k" is a terminating fraction N in base b", where b is your choice of base (like base 10), and k is some positive integer. So

pi = N/(1-b^k) which is rational.

(And if you set k=0, you get that pi "contains itself" in the sense that pi from the 1st digit onward is of course pi)

link

klmr 2980 days ago

> Can it contain itself?

Well it does, trivially.

You’re probably thinking of set theory but the file system is about storing information in a sequence and an offset. Seen that way, π can be stored with offset 0.

link

gnulinux 2979 days ago

If it's normal then it contains all finite strings uniformly in its base-b expansions. Pi is not a finite string so it wouldn't have to contain itself uniformly. More specifically if it contains itself then it's a rational number, but we know that pi is not a rational number. Note that we don't know whether pi is a normal real number.

link

vollmond 2980 days ago

To be fair, no other filesystem could contain pi either, right?

link

sethgecko 2980 days ago

How do you even calculate pi with such accuracy?

link

snarfy 2980 days ago

https://en.wikipedia.org/wiki/Bailey%E2%80%93Borwein%E2%80%9...

link

tzahola 2980 days ago

Undortunately with this solution you’re still storing the same amount of information as you’d normally do. It’s just encoded as offsets into pi’s decimal expansion. You can’t cheat entropy.

link

misja111 2980 days ago

This is brilliant, thank you!

edit: am I the only one who thinks that this whole pifs filesystem was meant as a joke?

link

petters 2980 days ago

Everyone knows it is a joke.

link