Hacker News new | ask | show | jobs
by reidacdc 2289 days ago
Seems like the confusion arises because getchar() (or its equivalent in langauges other than c) can produce an out-of-band result, EOF, which is not a character.

Procedural programmers don't generally have a problem with this -- getchar() returns an int, after all, so of course it can return non-characters, and did you know that IEEE-754 floating point can represent a "negative zero" that you can use for an error code in functions that return float or double?

Functional programmers worry about this much more, and I got a bit of an education a couple of years ago when I dabbled in Haskell, where I engaged with the issue of what to do when a nominally-pure function gets an error.

I'm not sure I really got it, but I started thinking a lot more clearly about some programming concepts.

4 comments

The amusing thing about it is that C does not guarantee that EOF is out-of-band!

ISO C says that char must be at least 8 bits, and that int must be at least 16. It is entirely legal to have an implementation that has 16-bit signed char and sizeof(int)==1. In which case -1 is a valid char, and there's no way to distinguish between reading it and getting EOF from getchar().

... which is why no system ever implements things this way. There are many portions of the C spec that can be ignored.

Large swaths of the C standard were built during the heyday of computer design, when you had all sorts of wacky sizes, behaviors and abstractions. Lots of "undefined behavior" is effectively deterministic, because all modern computers have converged to do so many things the same way.

TI DSPs with 16-bit char are still being made. It's a niche thing that most people will never need to care about, but it's not just a historical quirk and definitely not "no system ever".
Then there's SHARC with its 32-bit char.

Do architectures like that have non-freestanding C implementations, though? It's kinda moot if there's no getchar()...

> and did you know that IEEE-754 floating point can represent a "negative zero" that you can use for an error code in functions that return float or double?

I am begging, please never ever do this. NaN literally exists for this reason. NaN even allows you to encode additional error context and details into the value.

+DBL_MAX. Negative zero is an entirely valid, if rare, result of certain computations.
IEEE 754 has infinities as well, no need to constrain yourself to DBL_MAX :)
I taught myself C on MS-DOS in middle school, decades ago. Could have sworn that ASCII 26 was named “EOF” even if modern text files don’t include it.

This is a supplementary source of confusion.

Wikipedia supports this:

> Character 26 was used to mark "End of file" even if the ASCII calls it Substitute, and has other characters for this. Number 28 which is called "File Separator" has also been used for similar purposes. [1]

I think today we would think of character 4 (End of Transmission, Ctrl-D) as the end of file/input marker, but historically Character 26/Ctrl-Z was used, even on disk.

1: https://en.wikipedia.org/wiki/Substitute_character

See, this is why you should not believe Wikipedia.

The DOS syscall interface has no concept of an EOF character. ^Z being considered EOF was a feature of the COPY command, later replicated by the runtimes of various languages targetting DOS.

http://jdebp.info/FGA/dos-character-26-is-not-special.html

Not just DOS. CP/M also used CTRL-Z, principally because file lengths weren’t stored on disk - just the list of 128-byte blocks. So to get granularity beyond multiples of 128, you need an explicit EOF character.
I think TYPE would also treat ^Z as a terminator of the file. I think it was common in DOS to have binary files with a textual header followed by ^Z, that would hide the binary part.
Yea, it's confusing. https://news.ycombinator.com/item?id=22572703, read EDIT 3, I found this pretty illuminating.
What does "Procedural" vs "Functional" have to do with this? It's a choice in data type.

If by procedural you mean, nonsense, then sure... I agree that a function named `getchar` returning an `int` is procedural. :P

I suspect it was a product of the OP's musing about errors. Side effects are common in programming languages outside of pure functional languages. When you have a pure functional language, what do you do if the type you are returning can't represent an error? You also can't have side effects (for example throw an exception), so it's doubly important that you make sure your return type can encode errors. I suspect that's all they meant. The choice of wording was just unfortunate (especially the use of "procedural" -- what do I do if I can't return values??? ;-) ).
Nothing, apart from the fact that languages with type systems designed more carefully than C happen to be functional languages, to one extent or another.

(Though by the way: having functions that evaluate to a value when executed is itself a feature that belongs to the functional paradigm, although one so trivial and common that it’s not usually thought as such. But a purely imperative/procedural way of returning values would be via out parameters or global variables.)

The simple answer to this is that these days "functional programming" doesn't just mean the absence of side effects. It means strong type systems, algebraic data types, list comprehensions, etc. It is a distinct cultural stream in the development of programming languages. Of course "functional" has an original narrow meaning, but so do "Republican" and "Democrat".

When Rust introduced ADTs they were recognizably a concept from functional programming. It's a place or community of practice, not a purely descriptive adjective.

What they mean to say is: when I was working with a language that enforced pure functions, I had to actually consider purity. It's rare to see a way to enforce purity in procedural languages, whereas most fp langs support it.
Are we talking about even roughly the same concept of functional purity [1]? Nothing is stopping a pure function from representing EOF as -1.

Implementing IO in a "pure" way, is however another discussion.

[1]: https://en.wikipedia.org/wiki/Pure_function

Mostly, do you know of a single procedural language with a concept of IO monads in its stdlibs?
> If by procedural you mean, nonsense, then sure

Why are you being snarky?

They clearly mean the issue of modelling partial functions which would normally be done by a side-effect in a procedural language but can’t in a functional language.

No, they imply that the handling is done by returning a negative number.

I'm being snarky, as is my nature, to highlight the madness of a function called `getchar` returning anything but a `char`.

It's not a great snark given that the C standard considers the signedness of char to be implementation defined, making -1 a valid option, sometimes.
I'm sorry you don't find it great (I still do). Integers are not characters.

Integers are numbers like -1337, 0, and 42.

Characters are things that compose strings of text.

These are not the same kind of thing at all. Just because APIs may be leaky, and some of these APIs are held in very high regard doesn't change that fact.

In the end, integers, floating point numbers, "text\n", emojis etc. are just sequences of bytes. You choose to acknowledge it and take advantage of it, or you don't.
a char isn't a character, though. you can't add two characters together and get another character. it's a number.

getchar() gets a char. not a character.

> the madness of a function called `getchar` returning anything but a `char`

It’s effectively returning a Maybe(char).

But it's not.

A `Maybe<char>` has exactly one `None` variant. While an `int` has many, many negative values.

Also, just calling it `None` (or similar) makes clear what is meant, while `-1` is some magic value.

> while `-1` is some magic value

It's a documented return value. Nothing magic about it.