Hacker News new | ask | show | jobs
by gardaani 453 days ago
Wikipedia has a good explanation why the PNG magic number is 89 50 4e 47 0d 0a 1a 0a. It has some good features, such as the end-of-file character for DOS and detection of line ending conversions. https://en.wikipedia.org/wiki/PNG#File_header
3 comments

The old PNG specification also explained the rationale: http://www.libpng.org/pub/png/spec/1.2/PNG-Rationale.html#R....

But the new spec doesn't explain: https://www.w3.org/TR/2003/REC-PNG-20031110/

That is unfortunate. Not enough standards have rationale or intent sections.

On the one hand I sort of understand why they don't "If it is not critical and load-bearing to the standard. Why is it in there? it is just noise that will confuse the issue."

On the other hand, it can provide very important clues as to the why of the standard, not just the what. While the standards authors understood why they did things the way they did, many years later when we read it often we are left with more questions than answers.

Clear rationales doesn't sell well: better to obfuscate specs with hidden design decisions and build complexity moat.
At first I wasn't sure why it contained a separate Unix line feed when you would already be able to detect a Unix to DOS conversion from the DOS line ending:

0D 0A 1A 0A -> 0D 0D 0A 1A 0D 0A

But of course this isn't to try and detect a Unix-to-DOS conversion, it's to detect a roundtrip DOS-to-Unix-to-DOS conversion:

0D 0A 1A 0A -> 0A 1A 0A -> 0D 0A 1A 0D 0A

Certainly a very well thought-out magic number.

Unix2dos is idempotent on CRLF, it doesn’t change it to CRCRLF. Therefore converted singular LFs elsewhere in the file wouldn’t be recognized by the magic-number check if it only contained CRLF. This isn’t about roundtrip conversion.
It's also detecting when a file on DOS/Windows is opened in "ASCII mode" rather than binary mode. When opened in ASCII mode, "\r\n" is automatically converted to "\n" upon reading the data.
I can count the number of times I've had binary file corruption due to line ending conversion on zero hands. And I'm old enough to have used FTP extensively. Seems kind of unnecessary.
“Modern” FTP clients would auto detect if you were sending text or binary files and thus disable line conversations for binary.

But go back to the 90s and before, and you’d have to manually select whether you were sending text or binary data. Often these clients defaulted to text and so you’d end up accidentally corrupting files if you weren’t careful.

The pain was definitely real

And, if you were using a Windows client talking to a Unix server, you didn't want to get a text file in binary mode, since most programs at the time couldn't handle Unix line endings. This is much better nowadays, to the point that it rarely matters on either side of the platform divide which type of line endings you use.
It can easily happen with version control across Windows and Unix clients. I’ve seen it a number of times.
Not for binary files. I've seen it for text files, sure.
I’ve seen it with binary files a number of times.
Have you tried using git with Windows clients?

There are so many random line conversions going on and the detection on what is a binary file is clearly broken.

I don't understand why the default would be anything but "commit the file as is"

> I don't understand why the default would be anything but "commit the file as is"

Because it’s not uncommon for dev tools on Windows to generate DOS line endings when modifying files (for example when adding an element to an XML configuration file, all line endings of the file may be converted when it is rewritten out from its parsed form), and if those where committed as-is, you’d get a lot of gratuitous changes in the commit and also complaints from the Unix users.

For Git, the important thing is to have a .gitattributes file in the repository with “* text=auto” in it (plus more specific settings as desired). The text/binary auto-detection works mostly fine.

Up until just a few years ago, Notepad on Windows could not handle Unix-style line endings. It probably makes sense now to adopt the as-is convention, but for a while, it made more sense to convert when checking out, and then to prevent spurious diffs, convert back when committing.
Line endings between windows and unix-like systems were so painful that when I started development on my shell scripting language, I wrote a bunch of code to all Linux to handle Windows files and visa versa.

Though this has nothing to do with FTP. I’d already abandoned that protocol by then.

Yes I have. For binary files it's never an issue because Git detects those and doesn't to line ending conversion.

And I agree "commit the file as-is" should be the default - what programming editor can't handle unix newlines?