Hacker News new | ask | show | jobs
by jobu 2885 days ago
This reminds me a bit of the science of nanoinformatics as described in one of the Expanse novellas (The Vital Abyss):

"A thought experiment from my first course in the program: Take a bar of metal and put a single notch in it. The two lengths thus defined have a relationship that can be expressed as the ratio between them. In theory, therefore, any rational number can be expressed with a single mark on a bar of metal. Using a simple alphabetic code, a mark that calculated to a ratio of.12152205 could be read as 12-15-22-05, or “l-o-v-e.” The complete plays of Shakespeare could be written in a single mark, if it were possible to measure accurately enough. Or the machine language expression of the most advanced expert systems, though by then the notch might be small enough that Planck’s constant got in the way. How massive amounts of information could be expressed in and retrieved from infinitesimal objects was the driving concern of my college years."

Pure fiction at this point, but it would be an interesting experiment to encode data into objects that could be expressed using the mathematical ratio of their shapes or sizes.

6 comments

> Or the machine language expression of the most advanced expert systems, though by then the notch might be small enough that Planck’s constant got in the way.

With Planck's length being roughly 10^-35m, I'd say you'd hit the limit trying to store more than 15 bytes.

Also depends on the length of the metal rod, doesn't it?
The observable universe is 2^205.5 Planck lengths across, so you can store at most 205 bits, or 25 bytes.
And at that length your latency gets reeeeeeal bad.
This is insightful. 15 bytes is not a lot. I wonder what are other narural limits on information density? For example, magnetic field. Is there a least measurable difference?
The limit on information density is called the Bekenstein Bound, the point after which adding more information to the volume would create a black hole.

https://en.m.wikipedia.org/wiki/Bekenstein_bound

How are you calculating a value of simply 15 bytes? The proposal is that the mathematical value of the ratio of the lengths of rod and the notch will deliver a value on the number line, which in effect can be of any desired length. You actually could find a number that can represent the complete contemporary knowledge of Humanity.

The limitations mentioned in the above comment relate to the fact that to know you'd arrived at the number designed into the Notch and the Rod, you'd have to have agreed to beforehand the tolerance limits on measurement of this 'device' so that the uncertainty in measurement can be ignored and the ratio derived.

To maybe get the right number you would have to expand your possible number of ratios (at a certain measurement sensitivity you would have a certain number of lengths you could use, and thus certain number of ratios would be available to you) and to get just the right number to describe your whole "machine language expression of the most advanced expert systems" you would have to delve into tolerances of the Plank length.

Or adjust the size of your rod length and notch, make them bigger, to get that sweet number with a 'poorer' level of sensitivity.

.. Or find a number with enough usable number sequences to serve the purpose and program in the numbers surrounding gibberish as markers/jump points to the next sequence of usable numbers. I suppose you could find enough usable sequences in the full expression of pi (as it's rational expression is without end) to write a program that can decode the full linux kernel out of it.

Assuming the length of the rod is 1m, and you have a resolution of one plank length, there are 10e35 possible ratios you can express (because that's the number of possible locations of the notch). that's about 2^113, which is a number which fits in 15 bytes of information. As discussed below, if you also allow for the bar to be the size of the observable universe, this doesn't increase by much. A notch or ratio is linear, information and combinatorics grows a lot lot faster than that.
How would you say a rod to notch ratio of 22/7 compares? How much information would you say that has?
Meaningless on its own. What matters is how many distinct ratios are possible.
> Meaningless on its own.

That's the crux. What we're saying is that, of these set of ratios, there exists one whose infinite decimal representation contains our intended data. Yes we'd be limited to Planck length resolution, but the idea is that we would determine such a ratio and construct the notch and bar in such a way that it yields the decimal sequence desired; the # of particular ratios is not relevant.

I wouldn't be surprised if it can be proven this can't be done, but then that would be proper question to ask.

you can loosely think of it as writing down a number in unary where each "digit" takes 1.6 x 10^-35 m space to draw.
But what does that have to do with the emergent ratio between the notch and the rod? There may be 10^35 possible steps, but that does not mean that the answer, the ratio of the notch and the rod, will be limited by that. The answer will come from the number line, where any irrational number can have however many trailing numbers. If the ratio of the rod and the notch is 22/7, how much information would you say that is?
If 'x' marks the notch of a rod of length "1", [0---------x------------1], then the implied ratio is not x/(1-x), but x/1 (so the ratio is always < 1.) Even so, your question could be "what is the information content of 1/7" (the presumed implication being that 1/7, while periodic, has an infinite decimal representation.)

But that is not the direction we are interested in. We would like, given a message, such as "l-o-v-e", or 12-15-22-05, or 0.12152205, to figure out what is the ratio that uniquely specifies it. As we can only mark one notch, we can create "only" h ~= 10^35 ratios, or represent h unique messages. We know how to distinguish between h unique elements with log(h) bits (we just enumerate them from 1 to h and write that number down in binary.)

Let's assume that the rod is 1 m in length, and say I wrote a book that is perfectly represented by the number 0.abcdefg...yz. If that number is perfectly represented by one of the ratios in set 'h', then have I not stored more infromation than 15 bytes?

As for x/(1-x), why not? And why limit ourselves to a 1 m rod? Why not a 22 m rod with a 7 m notch? I could then define the method of decoding the information via (Rod length)/(notch length). The I'd have 'infinite' information in the form of expression of pi.

My main issue with the parent comment is that they imply only 15 bytes of data could be stored via this method. I think that's prespoterous as the number of ratios my be only 15 bytes, the ratios themselves can have any possible size.

It becomes more a game of probability rather than that of exact numbers. Will you find the right number, from set 'h', that matches exactly what you wanted to say?

    0.abcd...xyz = [(notch length=alpha*plank length)/(rod length=beta*plank length)]
where alpha and beta are just any variables that you play with until you solve the equation.
The same has been said about Pi (3.14). If you can compute, store and search enough the digits of Pi, you can reference anything by just providing the 'start' and 'finish' locations. Unfortunately, with enough digits of Pi, the 'start' and 'finish' numbers can get quite long themselves.
Years ago I tried this and basically ended up proving that if Pi is random and you are "compressing" random data, on average the start and finish numbers together are at least as long as the numbers you are trying to "compress."
Did you proove it formally or only by experiment. If it were a real proof, did you publish it somewhere?
In fact, any lossless compression algorithm has the property that the output is (on average) at least as long as the input. The best you can hope for is an algorithm that compresses the kind of data that humans want to store, at the expense of making other data a bit longer. If you're trying to compress random data then you just can't do it.

Here's a proof: consider the strings of length n or less, suppose there are M of them in total. Their average length is just the sum of all their lengths divided by M, and the average length of their compressed versions is just the total length of the compressed versions divided by M. Since the compression is lossless the compressed strings must all be different.

Since there are M strings, if any of them mapped to a string of length more than n then there must be some string of length at most n not being mapped to, so the average length can be improved by instead mapping that string to the shorter string. So any optimal compression method must map only to the strings of length at most n.

So the M outputs are just the M inputs, possibly permuted. So their total length is the same, and hence their average length is the same.

> any lossless compression algorithm has the property that the output is (on average) at least as long as the input.

The article you’ve linked says nothing about average. It says that for every algorithm there’s at least some input files that increase the size. It even explains more about that:

Any lossless compression algorithm that makes some files shorter must necessarily make some files longer, but it is not necessary that those files become very much longer. Most practical compression algorithms provide an "escape" facility that can turn off the normal coding for files that would become longer by being encoded. In theory, only a single additional bit is required to tell the decoder that the normal coding has been turned off for the entire input

Thanks. I realized this just after I posted it, so I wrote the proof into my comment instead.
>In fact, any lossless compression algorithm has the property that the output is (on average) at least as long as the input

I don't think this is true. If it was, lossless compression would be useless in a lot of applications. It's pretty easy to come up with a counter example.

E.g.

(simple huffman code off the top of my head, not optimal)

symbol -> code

"00" -> "0"

"01" -> "10"

"10" -> "110"

"11" -> "111"

If "00" will appear 99.999% of the time, and the other 3 symbols only appear 0.001% of the time, the output will "on average" be slightly more than half the length of the input.

Sure, I'm assuming that (a) you are trying to encode all strings of length at most n and (b) you have the uniform distribution over those strings. This makes sense in the original context of encoding random data.
i did somewhat the same thing. it introduced me to programming... i thought how about multiple start indices and fixed width? you can then compress the list of start indices in the same manner until you reach sufficient compression :D
http://www.angio.net/pi/

My 11 digit phone number occurs around the 115 millionth digit, a grand saving of two digits.

Wow, isn’t there a really low probability of finding your phone number in the first 200m digits of pi? (0.09995% in first 100m) I’m tempted to start throwing a dictionary of phone numbers through this pi lookup to find your number, call you, and verify, I think you could quickly narrow in on your phone number given the information above.
I think you'll have trouble. Assuming the first digit of the phone number is somewhere between 114.5 million and 115.5 million, you have 1 million potential 11 digit numbers to check.

There are 10^11 sequences with 11 digits. The number of people in the USA is 3×10^8, and we can assume there is roughly 1 number per person (some people don't have a phone number and some people have more than one, but it turns out that the exact approximation won't matter unless we're a few of orders of magnitude off). So about 0.3% of 11 digit sequences are valid phone numbers.

So there are approximately 0.3% × 1000000 = 3000 people with phone numbers around the 115 millionth digit of pi. You have no way of knowing which one of those people is sjcsjc.

It'd be a great 1 person-audience magic trick to call up all 3000 of those people and say, "Hi sjcsjc, I found your phone number in Pi".
Derren Brown did a great series of tricks based on probability.. your comment reminded me on the one that showed a video of him flipping 10 heads in a row on an unbiased coin. He revealed them that he had done it by flipping a coin for hours until he got that sequence.
Here's an implementation of precisely that, as a filesystem:

https://github.com/philipl/pifs

Discussed here previously (2014):

https://news.ycombinator.com/item?id=8018818

The indexes will, on average, have more digits than the sequence you compress. This is true for any infinite sequence of digits, not only Pi.
World's most painful compression algorithm: Finds a mathematical series of infinite digits, and the offset into it, that most efficiently compresses data passed in. Probably have to chunk the data up to make this efficient.

Of course one can argue that all current real life compression algorithms are aiming to simulate this, and that a brute force algorithm is one of those "after turning the sun into a CPU, still won't have enough compute power to finish the problem" types of solutions.

Yep, not fiction at all, pretty much what is used for state-of-the-art compression.
The fiction part is being able to notch the bar sufficiently precise to store any significant amount of data.
You can if you start from the middle out.
:-)
The late mathematics popularizer Martin Gardner wrote about this concept in the 1970s (I think his example involved reducing the Encyclopedia Britannica to a notch), although I don't know if the idea was unique to him or if he was popularizing an earlier idea.
Reminds me of 'pulse position modulation' [1].

[1] https://en.wikipedia.org/wiki/Pulse-position_modulation

You could split the package of data into chunks and place multiple notches on the bar. You'd need to include enough information to allow the chunks to be sorted into their original order for that to work.

As if this were a practical means of storing data.