Hacker News new | ask | show | jobs
by damck 2994 days ago
Dunno if just mobile problem, but most code has ^M instead of linebreaks; hard to browse oneline source
6 comments

It's because the code was written on classic Mac OS, which used CR newlines rather than the LF or CRLF styles used today.
lol, I knew LF and CRLF was a thing, but only CR? Interesting :D
Taken literally just CR seems very odd.

You probably know this, but LF is "line feed" which advances the feed.

CR is "carriage return" which puts the carridge to the start of the line.

The combination makes sense because it does both, it returns the carriage and advances the line.

Just LF kind of makes sense because when you advance the line you can think of the line being empty.

But conceptually just CR suggests returning the carriage but that doesn't imply the newline.

Of course the terminology is originally from type-writers so it doesn't have to make sense, but it does seem odd that some systems chose just CR.

On a typewriter from around the time of the first Macs, the return key normally moves back to the beginning and moves down a line, so that's probably why they picked it. They also called the key "return" rather than "enter."
On a classic non-electric typewriter the carriage was returned by pushing a lever on the right side of the carriage. The lever could be pressed to the left relative to the carriage which would move the paper up. At the end of travel relative to the carriage if you kept pushing it would move the carriage back to the left. Hence CR and LF could be accomplished in one motion. If you wanted to advance the paper many lines, you'd push the lever multiple times - probably with the carriage all the way to the left the whole time.

On another note, I find it incredibly weird describing an "everyday" item like a typewriter because many people on HN may have never used one!

Strange that 'return' takes you to a new line, rather than returning you to the previous line. English!
It's a simple mechanical artefact. Carriage return moves the paper carriage all the way to the right (returning it to its initial position), while line feed will advance the paper vertically (feeding a line through the carriage). With that in mind, line feed and carriage return on their own make about as little sense.

On later type writers both the line feed and carriage return were integrated into a single key or lever, which was just called carriage return or return. From this perspective, an encoding using a lone CR for newlines might make more sense than one using a lone LF. But neither combination really makes intuitive sense in buffered, electronic systems. It's just how it is.

(Nevermind)
Brings back memories of staring at a keyboard unsuccessfully looking for a "return" key to satisfy "Press return to continue..."
Somehow the friends who told me how to use a PC always used "Return" instead of "Enter", but this was in Germany where we didn't have any text on this key, just a big arrow pointing down and left.
CR is a common line ending for RS-232 devices. I've got 3 or 4 in a cabinet for my current job which are "line-based" and which use CR as the line-ending. To issue a command to one of these devices, you terminate the command with CR and then the device processes it. These same devices will also send response data with CR line endings.

This is partly why stty and termios have options for CRLF translation on terminals. I'm sure there's some historical reason for that.

I thought it was to do with the fact that you might want to use them separately sometimes? For instance you might want to repeat a line without a newline for a "bold" effect, or with underscores for underlining.
It's no more odd than just 'LF'. At the end of the day, something has to represent 'end of line'. The outlier here is really MS DOS, wasting a perfectly good byte on pretend-lineprinter codes.
The internet also uses CRLF for newlines since about 1973: https://tools.ietf.org/rfc/rfc542.txt

So historically speaking, MS DOS is doing the right thing.

1971, really - https://tools.ietf.org/html/rfc158

In 1973, the 'internet' was yay big:

https://twitter.com/workergnome/status/807704855276122114/ph...

Apple sold more Apple I's than that in its first few months of existence.

MS-DOS isn’t alone in that. https://en.wikipedia.org/wiki/Newline#Representations_in_dif... claims ”Atari TOS, Microsoft Windows, DOS (MS-DOS, PC DOS, etc.), DEC TOPS-10, RT-11, CP/M, MP/M, OS/2, Symbian OS, Palm OS, Amstrad CPC, and most other early non-Unix and non-IBM operating systems” used it.

The real surprise is the BBC Micro, which used LF+CR.

DOS did it because CP/M did it.

If I had to guess I'd say CP/M was developed for extra dumb teletypes that needed both to properly handle a newline. So many quirks in terminals date back to the days when everybody was just making it up as they went along. Legacy support is the root of most braindamage.

There's an entry point in the BBC Micro's OS that's routine that prints a character, or LF+CR if the character is CR (13). This routine prints the CR second, so that when it was called with 13 originally it can fall through into the main, non-translating character print routine with 13 in the accumulator, and then return to the caller that way too. (These two routines promise to preserve the accumulator.) Saves a couple of bytes in the ROM.

(There's also an entry point partway through the wrapper that just prints a newline. DRY and all that.)

The code is not exactly this, but differs from it in no relevant way:

    .osasci \ print char, translating CR to newline
        cmp #13:bne oswrch
    .osnewl \ print newline
        lda #10:jsr oswrch
        lda #13:\fall through
    .oswrch \ print char without translation
        pha
        ...
        pla
        rts
The Atom's ROM does the same thing, so they presumably just copied this for the BBC. After all, it doesn't really matter which order you print them.
A lone CR is certainly more odd than a LF...LF is implicitly a new line. CR by definition is a return to the beginning of the line; literally NOT the end of the line!

Who would choose 'beginning of line' to mean 'end of line'?? Oh, Apple :)

There's nothing more 'implicitly new line' about either, in a text file. But if you're particularly set on hardware analogies - there's a 'return' key on keyboards and no 'line feed' one.
Except that LF was also designated as NL before Apple came onto the scene. They were deliberately incompatible.
You're pulling things out of your ass. I'm interested in comments like yours. At some point between reading the previous comment and responding to it, you must have had some thought that goes, roughly, "Hey, I'm just going to make some shit up now and post it." Right? How else does it happen?

Apple's is not the only ecosystem that settled on carriage return. Who do you think was around for them to deliberately break compatibility with in 1977? Kildall?

You think when Woz was designing the Apple II he was going "oh better make this incompatible with UNIX"?

Just like some people will defend Apple to the death, others will desperately assign malice to literally anything they do. Funny company.

It was? By whom? What were they 'deliberately incompatible' with?
Well, they could just apply a minimal cleanup for end lines before posting.
Then the people who want to compile it rather than complain about it would have to go through an extra hoop to fix it again. They published it the right way.
Plus, if they replaced the ^Ms, what would we have to complain about? This truly is the best way. People can complain, and people can compile.
That is an excellent point although I suspect your question is going to be answered in spades when people notice who the developer posting the code is.
I don't know anything about this developer?
Agreed, at least make sure it will still compile with line endings changed before submitting pull requests.

This builds with CodeWarrior 10 (from 1996) which is probably OK with non-Mac line endings, but there are other old Mac codebases on GitHub using older toolchains that require CR. (i.e. Pararena 2: http://bslabs.net/2016/11/13/building-pararena/)

A PPC Mac running OS X 10.4 is basically the only way to work with both git and classic Mac dev environments, I'll be trying this out myself.

I still have my Power MachTen CD. I used to love wasting time porting GNU/Linux source to that odd ball environment.
A year or two ago I built dropbear on Power MachTen, worked great and just needed a couple fixes to bring it back into compatibility with GCC 2.x.

Playing around with Professional MachTen (for 68k) is a real trip though. 4.3BSD, an even older GCC, and it implements virtual memory and protection by taking over the system memory manager. (You actually have to restart when quitting it)

It would be so cool to have something like MachTen on iOS, some people have tried but the restrictions on executable pages really restricts things.

Eh, I prefer when they don't clean up anything. It feels more like real archival work.
If you click on the "raw" view it will render properly in the browser.

You might also need to force your browser to display with "Western" text encoding (that's what it's called in Firefox, not sure about other browsers).

There's a CR->LF conversion at:

https://github.com/chungy/shockmac/

Viewing the raw file, at least in Chrome on OSX shows line breaks.
Same thing on desktop.
I put in a PR to fix that: https://github.com/NightDiveStudios/shockmac/pull/2

Best thing about GPL code is that we can fix it!

Changing line endings isn't a "fix"... if you do that, it's unusable on the intended system upon which you would work on this code - System 7 or Mac OS 8.
I'm pretty sure CodeWarrior was able to deal with any kind of line endings: CR, LF, CR+LF.
This might be a dumb question (I spend most of my days in Ruby), why can't this compile with gcc and make? It's all just C (and a few files C++) right?
The fix that needs to be made here is in Git, which fails to process line endings in a sane way (and includes a bunch of insane, poorly documented config options that approach the problem completely sideways).