Hacker News new | ask | show | jobs
by l33t7332273 577 days ago
I feel like if the FAQ requires not using filename shenanigans then the slight of hand was illegal the whole way.
2 comments

He didn't use filenames, he used files, and if that were illegal, Mike shouldn't have accepted it.
He does use the filenames. If you change the filenames randomly (such that the files sort differently), it does not work.
> such that the files sort differently

But if you change them without making them sort differently, everything is fine. He depends on the order, not the filenames. You could even remove the filenames entirely, as long as you patch the code to account for such a strange environment.

Not really a good point. If the order of bytes does not matter, then I can compress any file of your liking to O(log n) size :P
Wait, whose point are you saying is not good?

I'm saying order does matter and it's the only thing that matters about the separate files using this code.

I think the question is, if you remove the filenames entirely, how do you keep the parts ordered?

(Someone else suggested sorting them by file size.)

Not in any significant way. The decompressor could be changed to require you to feed the files into it in the correct order or expect some other sorting.

What you're saying is like saying that you encoded info in filenames because decompress.sh expects a file "compressed.dat" to exist. It's not describing any meaningful part of the scheme.

The filenames contain information that you need in some way for the scheme to work.

You are combining different parts and inserting a missing byte every time you combine the files. You need to combine the parts in the correct order, and the order is part of the information that makes this work.

If the ordering isn't coming from filenames, it needs to come from somewhere else.

You could do the same spitting trick but only split at progressively increasing file lengths at the character '5'. The "compression" would be worse, so you'd need a larger starting file, but you could still satisfy the requirements this way and be independent of the filenames. The decompressor would just sort the files by increasing length before merging.
Nice idea, but doesn't this require a linear increase of the length of the partial files and a quadratic size of the original file?

If the length of a file is X, then in the next file you must skip the first X characters and look for a "5" that in average is in the X+128 position. So the average length of the Nth file is 128*N and if you want to reduce C bytes the size of the original file should be ~128C^2/2 (instead of the linear 128*C in the article).

That's a neat idea.
The FAQ was not part of the challenge statement. It was part of the newsgroup I believe.