This reminds me a bit of the science of nanoinformatics as described in one of the Expanse novellas (The Vital Abyss):
"A thought experiment from my first course in the program: Take a bar of metal and put a single notch in it. The two lengths thus defined have a relationship that can be expressed as the ratio between them. In theory, therefore, any rational number can be expressed with a single mark on a bar of metal. Using a simple alphabetic code, a mark that calculated to a ratio of.12152205 could be read as 12-15-22-05, or “l-o-v-e.” The complete plays of Shakespeare could be written in a single mark, if it were possible to measure accurately enough. Or the machine language expression of the most advanced expert systems, though by then the notch might be small enough that Planck’s constant got in the way. How massive amounts of information could be expressed in and retrieved from infinitesimal objects was the driving concern of my college years."
Pure fiction at this point, but it would be an interesting experiment to encode data into objects that could be expressed using the mathematical ratio of their shapes or sizes.
> Or the machine language expression of the most advanced expert systems, though by then the notch might be small enough that Planck’s constant got in the way.
With Planck's length being roughly 10^-35m, I'd say you'd hit the limit trying to store more than 15 bytes.
This is insightful. 15 bytes is not a lot.
I wonder what are other narural limits on information density? For example, magnetic field. Is there a least measurable difference?
The limit on information density is called the Bekenstein Bound, the point after which adding more information to the volume would create a black hole.
How are you calculating a value of simply 15 bytes? The proposal is that the mathematical value of the ratio of the lengths of rod and the notch will deliver a value on the number line, which in effect can be of any desired length. You actually could find a number that can represent the complete contemporary knowledge of Humanity.
The limitations mentioned in the above comment relate to the fact that to know you'd arrived at the number designed into the Notch and the Rod, you'd have to have agreed to beforehand the tolerance limits on measurement of this 'device' so that the uncertainty in measurement can be ignored and the ratio derived.
To maybe get the right number you would have to expand your possible number of ratios (at a certain measurement sensitivity you would have a certain number of lengths you could use, and thus certain number of ratios would be available to you) and to get just the right number to describe your whole "machine language expression of the most advanced expert systems" you would have to delve into tolerances of the Plank length.
Or adjust the size of your rod length and notch, make them bigger, to get that sweet number with a 'poorer' level of sensitivity.
.. Or find a number with enough usable number sequences to serve the purpose and program in the numbers surrounding gibberish as markers/jump points to the next sequence of usable numbers. I suppose you could find enough usable sequences in the full expression of pi (as it's rational expression is without end) to write a program that can decode the full linux kernel out of it.
Assuming the length of the rod is 1m, and you have a resolution of one plank length, there are 10e35 possible ratios you can express (because that's the number of possible locations of the notch). that's about 2^113, which is a number which fits in 15 bytes of information. As discussed below, if you also allow for the bar to be the size of the observable universe, this doesn't increase by much. A notch or ratio is linear, information and combinatorics grows a lot lot faster than that.
But what does that have to do with the emergent ratio between the notch and the rod? There may be 10^35 possible steps, but that does not mean that the answer, the ratio of the notch and the rod, will be limited by that. The answer will come from the number line, where any irrational number can have however many trailing numbers. If the ratio of the rod and the notch is 22/7, how much information would you say that is?
If 'x' marks the notch of a rod of length "1", [0---------x------------1], then the implied ratio is not x/(1-x), but x/1 (so the ratio is always < 1.) Even so, your question could be "what is the information content of 1/7" (the presumed implication being that 1/7, while periodic, has an infinite decimal representation.)
But that is not the direction we are interested in. We would like, given a message, such as "l-o-v-e", or 12-15-22-05, or 0.12152205, to figure out what is the ratio that uniquely specifies it. As we can only mark one notch, we can create "only" h ~= 10^35 ratios, or represent h unique messages. We know how to distinguish between h unique elements with log(h) bits (we just enumerate them from 1 to h and write that number down in binary.)
The same has been said about Pi (3.14). If you can compute, store and search enough the digits of Pi, you can reference anything by just providing the 'start' and 'finish' locations. Unfortunately, with enough digits of Pi, the 'start' and 'finish' numbers can get quite long themselves.
Years ago I tried this and basically ended up proving that if Pi is random and you are "compressing" random data, on average the start and finish numbers together are at least as long as the numbers you are trying to "compress."
In fact, any lossless compression algorithm has the property that the output is (on average) at least as long as the input. The best you can hope for is an algorithm that compresses the kind of data that humans want to store, at the expense of making other data a bit longer. If you're trying to compress random data then you just can't do it.
Here's a proof: consider the strings of length n or less, suppose there are M of them in total. Their average length is just the sum of all their lengths divided by M, and the average length of their compressed versions is just the total length of the compressed versions divided by M. Since the compression is lossless the compressed strings must all be different.
Since there are M strings, if any of them mapped to a string of length more than n then there must be some string of length at most n not being mapped to, so the average length can be improved by instead mapping that string to the shorter string. So any optimal compression method must map only to the strings of length at most n.
So the M outputs are just the M inputs, possibly permuted. So their total length is the same, and hence their average length is the same.
> any lossless compression algorithm has the property that the output is (on average) at least as long as the input.
The article you’ve linked says nothing about average. It says that for every algorithm there’s at least some input files that increase the size. It even explains more about that:
Any lossless compression algorithm that makes some files shorter must necessarily make some files longer, but it is not necessary that those files become very much longer. Most practical compression algorithms provide an "escape" facility that can turn off the normal coding for files that would become longer by being encoded. In theory, only a single additional bit is required to tell the decoder that the normal coding has been turned off for the entire input
>In fact, any lossless compression algorithm has the property that the output is (on average) at least as long as the input
I don't think this is true. If it was, lossless compression would be useless in a lot of applications. It's pretty easy to come up with a counter example.
E.g.
(simple huffman code off the top of my head, not optimal)
symbol -> code
"00" -> "0"
"01" -> "10"
"10" -> "110"
"11" -> "111"
If "00" will appear 99.999% of the time, and the other 3 symbols only appear 0.001% of the time, the output will "on average" be slightly more than half the length of the input.
i did somewhat the same thing. it introduced me to programming... i thought how about multiple start indices and fixed width? you can then compress the list of start indices in the same manner until you reach sufficient compression :D
Wow, isn’t there a really low probability of finding your phone number in the first 200m digits of pi? (0.09995% in first 100m) I’m tempted to start throwing a dictionary of phone numbers through this pi lookup to find your number, call you, and verify, I think you could quickly narrow in on your phone number given the information above.
I think you'll have trouble. Assuming the first digit of the phone number is somewhere between 114.5 million and 115.5 million, you have 1 million potential 11 digit numbers to check.
There are 10^11 sequences with 11 digits. The number of people in the USA is 3×10^8, and we can assume there is roughly 1 number per person (some people don't have a phone number and some people have more than one, but it turns out that the exact approximation won't matter unless we're a few of orders of magnitude off). So about 0.3% of 11 digit sequences are valid phone numbers.
So there are approximately 0.3% × 1000000 = 3000 people with phone numbers around the 115 millionth digit of pi. You have no way of knowing which one of those people is sjcsjc.
World's most painful compression algorithm: Finds a mathematical series of infinite digits, and the offset into it, that most efficiently compresses data passed in. Probably have to chunk the data up to make this efficient.
Of course one can argue that all current real life compression algorithms are aiming to simulate this, and that a brute force algorithm is one of those "after turning the sun into a CPU, still won't have enough compute power to finish the problem" types of solutions.
The late mathematics popularizer Martin Gardner wrote about this concept in the 1970s (I think his example involved reducing the Encyclopedia Britannica to a notch), although I don't know if the idea was unique to him or if he was popularizing an earlier idea.
You could split the package of data into chunks and place multiple notches on the bar. You'd need to include enough information to allow the chunks to be sorted into their original order for that to work.
As if this were a practical means of storing data.
I was confused as to how you could represent a bootable CD image in printable characters. It turns out that you can't. This is a tweet of a perl script which creates a cd.iso file that you can then boot from. The perl script significantly decompresses the data in the tweet.
That said, this is a playable game in around 60 bytes of actual data which is impressive.
It sounds like you've been trained to hate something that you actually love. That's kind of sad. Couldn't you just wear the awesome shirt while avoiding the negative aspects of neckbeards?
Not CD images, but there is DOS C compiler generating executables with only printable characters (and also no self-modifying code that could contain non-printable characters)
The compression is basic run-length encoding, leveraging the Perl repetition operator (x), and the property of the Perl print/say functions that they concat items passed in a list before writing to STDOUT.
I tried xz -9 on it and found, to my surprise, that it was actually longer than RLE!
Then I tried gzip -9, because perhaps that has a smaller header? Yup, saved a few bytes, now it's about the same size. Finally, I remembered that bzip2 does a lot better on text than gzip, and who knows, it might also have a shorter header than xz. Again, a few more bytes saved! Down to 223, where the original is 249 bytes (including the 'say' part but excluding the unnecessary delimiting apostrophes or the rest of the command).
Most of the "compression" is zero bytes due to fixed offsets of various things (e.g. the first 16 sectors of an ISO 9660 image are a "system area" not used by the actual file system).
As someone that develops almost exclusively in high-level languages on top of may levels of abstraction, it's nice to see what can be accomplished close to the metal.
This reminds me of Steve Gibson's SpinRite, which (from what I recall) is a fully functional disk recovery utility written entirely in assembly. https://www.grc.com/spinrite.htm. Say what you want about the man, but this is something that's saved me on at least one occasion, and is smaller than things I produce that do a lot lot less.
What I've read is that SpinRite simply just reads and writes to the disk, triggering the disk firmware to reallocate bad sectors. I think it just tries to read many times, which can sometimes help.
The other argument is that the various things SR tries to "manage" (sector interleave, getting various timing parameters "perfect", etc) were only relevant with ST-506 (!!) and similar disks from the 80s/very early 90s, and that anything remotely modern (even IDE, virtually 100% of SATA) generally doesn't provide enough low-level control surface that trying to micro-manage the disk's behavior will do anything particularly special.
Of course, I'm sure each manufacturer has their own tools and widgets that can use undocumented proprietary SATA/SCSI commands to control the drive's behavior at a very low level, but those kinds of tools are a) rare as hens' teeth and b) probably very easy to break disks with due to poor UI design and lack of documentation. Chances are the most expensive data recovery centers probably have some of these tools, and more importantly the training to know how not to kill HDDs with them :P
TL;DR: Yes, SR works, but probably just as well as dd3rescue; as always, if you think a disk is this side of dead and you think you have a chance without specialized tools, just imaging it is probably the best first step, because SR and all other tools will of course stress it.
With all of this said, I really, really like SR's startup animation :3 and I agree that it's refreshingly small.
Your snake game doesn't even have food for the snake to eat? The snake just grows without eating, until the head collides with the tail somewhere? Pssh, amateur hour...
My first thought was to make random sectors of the first hard drive the "food"... For better or for worse, this stuff really brings me back to my teens. Nice job!
Tron is a great game for fitting into a very small space. Back in high school I spent a lot of time optimizing Tron on the TI-82 to fit in as few bytes as possible. It looks like I got it down to 152 bytes: (only about 80 bytes of actual Z80 assembly code)
No it isn't.
In TRON you leave a trail behind. In Snake, you grow as you eat and drag your body along the path you crawled.
TRON's trail is static, Snakes body is dynamic.
And that difference only exists to support the single player vs multiplayer dynamic. If Snake left a Tron tail then it would be a very short, unfun experience. If Tron didn't leave a permanent trail then matches would last too long.
This is great, and ironically it’s sitting next to another HN article where, in the comments, someone is actually defending a NYT news article that weighs in at 6MB.
In a time when most software is filled with superfluous waste and endless layers of abstraction and libraries, it’s nice to see that the art of writing minimal software is not completely lost.
So page bloat is a real issue, not just "old man rants about good old days". A shame really because the rest of the page is pretty light wieght, 5.2kb for the content and 2.2 for the css.
Even worse is that the static preview of the monster gif is the second largest element.
I used to have 250MB free with my internet contract until the ISP silently upgraded it to 500MB recently (must have been the past year or so, not sure when).
If you use it only occasionally to read a few articles, you can do fine with only a few megabytes. Heck, I'd almost say kilobytes if bloat wasn't so common. Anyway, that's until shit like this comes along. If you were truly trying to watch a video---sure, that uses a lot of that tiny data bundle in one go, but a gif that should have been a video truly leaves you wondering why was this necessary?!
You just made me think of a GIF-to-mp4 conversion service that runs as a proxy.
Sadly, because of the "HTTPS everywhere!!!11" thing, such a service would not be viable (it would need to rewrite the <img> to a <video> in order to work, of course).
Opera were offering something like this for a while. With the support of a browser and HTTP proxying it's not a problem, the SSL terminates at the proxy and is re-encrypted under the proxy's SSL.
Many web services will take an uploaded gif and turn it to webm before showing it, e.g. Twitter.
For sure, although as a counterpoint I’ve also seen programmers do the reverse. Spending a lot of effort building bad abstractions to completely minimize code reuse/size, when really we would have been better off with slightly longer code that was clearer and that meshed better with our problem domain.
You can do “magic” with relatively easy things. For example, spend a weekend with https://forthsalon.appspot.com or processing.org and see where you can get to...
This is the second time this week I've seen someone note that ISO standards are expensive - are they copyrighted or something? Why doesn't someone just publish them online for free?
I bought the ISO 14000 Standard Document (Environmental Management) for $170 earlier this year, so it is not expensive for a company. Not sure how the "technical documents" ala ISO 9660 differs in price.
The certification on the other hand is a bit more costly though. My estimation of the certification cost for a ~20 people company if you do it as frugally as possible ended up at $18000 for the first year investment, and $6200 recurring the following years.
Man I love these things. Back before Twitter upped their character limits, I remember a trick to cram more data in a tweet was to abuse how Twitter counts characters (it attempts to count visually rather than by byte), so by using a ton of multipart emojis or larger Unicode characters to over double the information that could fit in a tweet.
Multibyte chars is optimizing for Twitter, not for a set amount of bytes. If you try to fit in 140 bytes and you use >140 bytes because they are multibyte chars, then yeah, you're cheating. But if you're trying to "fit in a tweet", I'd say that's perfectly fair game.
Anyhow, two games... cool, but to be perfectly honest, I thought the game was going to be more like the real snake than just drawing a non-overlapping line on screen. A more impressive game might be more impressive than two games.
The explanation is quite good but is there even more detail on what the assembly instructions are doing on a line by line basis? I’ve only written MIPS assembly and that was a long time ago.
yeah paste it into your plain text editor first and then do a code review and then save it to a jump drive and spin up the old backup computer that’s been wiped clean and then open up a sandboxed vm and then remove the wifi chip and then run it
I grew up (career wise) on CTOS systems with 286/386 processors that could address the full 16MB in protected mode without memory extenders or expanders that were available for DOS back in the day. Also premptive multi tasking. It was a great OS to learn on. more info - https://web.archive.org/web/20080828190425/http://www.byte.c...
It's my understanding that UEFI actually comes up directly into long mode?
"UEFI firmware performs those same steps, but also prepares a protected mode environment with flat segmentation and for x86-64 CPUs, a long mode environment with identity-mapped paging. The A20 gate is enabled as well."
Looks like the turbo button needs to be in the on position for this one: 0.05 FPS in a qemu KVM on my old P8600... I presume the game loop uses the hardware clock :P
For some reason, emulators (at least the ones I tried) wait 4x what real machines wait when you use BIOS int 15b 86h. You can tweak the code if you want to play at a faster speed.
There’s probably a sound explanation for this discrepancy...
I couldn't find "bf86 fec1". I did however find "b486 fec1".
"bf86 9042" made it literally so fast I physically couldn't keep up. The following worked for me (w/ QEMU on old (no KVM) Pentium M), this may be too fast on newer machines:
"A thought experiment from my first course in the program: Take a bar of metal and put a single notch in it. The two lengths thus defined have a relationship that can be expressed as the ratio between them. In theory, therefore, any rational number can be expressed with a single mark on a bar of metal. Using a simple alphabetic code, a mark that calculated to a ratio of.12152205 could be read as 12-15-22-05, or “l-o-v-e.” The complete plays of Shakespeare could be written in a single mark, if it were possible to measure accurately enough. Or the machine language expression of the most advanced expert systems, though by then the notch might be small enough that Planck’s constant got in the way. How massive amounts of information could be expressed in and retrieved from infinitesimal objects was the driving concern of my college years."
Pure fiction at this point, but it would be an interesting experiment to encode data into objects that could be expressed using the mathematical ratio of their shapes or sizes.