| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by reidacdc 2289 days ago

Seems like the confusion arises because getchar() (or its equivalent in langauges other than c) can produce an out-of-band result, EOF, which is not a character.

Procedural programmers don't generally have a problem with this -- getchar() returns an int, after all, so of course it can return non-characters, and did you know that IEEE-754 floating point can represent a "negative zero" that you can use for an error code in functions that return float or double?

Functional programmers worry about this much more, and I got a bit of an education a couple of years ago when I dabbled in Haskell, where I engaged with the issue of what to do when a nominally-pure function gets an error.

I'm not sure I really got it, but I started thinking a lot more clearly about some programming concepts.

4 comments

int_19h 2289 days ago

The amusing thing about it is that C does not guarantee that EOF is out-of-band!

ISO C says that char must be at least 8 bits, and that int must be at least 16. It is entirely legal to have an implementation that has 16-bit signed char and sizeof(int)==1. In which case -1 is a valid char, and there's no way to distinguish between reading it and getting EOF from getchar().

kstenerud 2289 days ago

... which is why no system ever implements things this way. There are many portions of the C spec that can be ignored.

Large swaths of the C standard were built during the heyday of computer design, when you had all sorts of wacky sizes, behaviors and abstractions. Lots of "undefined behavior" is effectively deterministic, because all modern computers have converged to do so many things the same way.

plorkyeran 2288 days ago

TI DSPs with 16-bit char are still being made. It's a niche thing that most people will never need to care about, but it's not just a historical quirk and definitely not "no system ever".

int_19h 2288 days ago

Then there's SHARC with its 32-bit char.

Do architectures like that have non-freestanding C implementations, though? It's kinda moot if there's no getchar()...

snek 2289 days ago

> and did you know that IEEE-754 floating point can represent a "negative zero" that you can use for an error code in functions that return float or double?

I am begging, please never ever do this. NaN literally exists for this reason. NaN even allows you to encode additional error context and details into the value.

saagarjha 2289 days ago

+DBL_MAX. Negative zero is an entirely valid, if rare, result of certain computations.

pwdisswordfish2 2289 days ago

IEEE 754 has infinities as well, no need to constrain yourself to DBL_MAX :)

fennecfoxen 2289 days ago

I taught myself C on MS-DOS in middle school, decades ago. Could have sworn that ASCII 26 was named “EOF” even if modern text files don’t include it.

This is a supplementary source of confusion.

wongarsu 2289 days ago

Wikipedia supports this:

> Character 26 was used to mark "End of file" even if the ASCII calls it Substitute, and has other characters for this. Number 28 which is called "File Separator" has also been used for similar purposes. [1]

I think today we would think of character 4 (End of Transmission, Ctrl-D) as the end of file/input marker, but historically Character 26/Ctrl-Z was used, even on disk.

1: https://en.wikipedia.org/wiki/Substitute_character

pwdisswordfish2 2289 days ago

See, this is why you should not believe Wikipedia.

The DOS syscall interface has no concept of an EOF character. ^Z being considered EOF was a feature of the COPY command, later replicated by the runtimes of various languages targetting DOS.

http://jdebp.info/FGA/dos-character-26-is-not-special.html

Doctor_Fegg 2289 days ago

Not just DOS. CP/M also used CTRL-Z, principally because file lengths weren’t stored on disk - just the list of 128-byte blocks. So to get granularity beyond multiples of 128, you need an explicit EOF character.

giovannibajo1 2289 days ago

I think TYPE would also treat ^Z as a terminator of the file. I think it was common in DOS to have binary files with a textual header followed by ^Z, that would hide the binary part.

nixpulvis 2289 days ago

Yea, it's confusing. https://news.ycombinator.com/item?id=22572703, read EDIT 3, I found this pretty illuminating.

nixpulvis 2289 days ago

What does "Procedural" vs "Functional" have to do with this? It's a choice in data type.

If by procedural you mean, nonsense, then sure... I agree that a function named `getchar` returning an `int` is procedural. :P

mikekchar 2289 days ago

I suspect it was a product of the OP's musing about errors. Side effects are common in programming languages outside of pure functional languages. When you have a pure functional language, what do you do if the type you are returning can't represent an error? You also can't have side effects (for example throw an exception), so it's doubly important that you make sure your return type can encode errors. I suspect that's all they meant. The choice of wording was just unfortunate (especially the use of "procedural" -- what do I do if I can't return values??? ;-) ).

pwdisswordfish2 2289 days ago

Nothing, apart from the fact that languages with type systems designed more carefully than C happen to be functional languages, to one extent or another.

(Though by the way: having functions that evaluate to a value when executed is itself a feature that belongs to the functional paradigm, although one so trivial and common that it’s not usually thought as such. But a purely imperative/procedural way of returning values would be via out parameters or global variables.)

jfdhvdybc 2289 days ago

The simple answer to this is that these days "functional programming" doesn't just mean the absence of side effects. It means strong type systems, algebraic data types, list comprehensions, etc. It is a distinct cultural stream in the development of programming languages. Of course "functional" has an original narrow meaning, but so do "Republican" and "Democrat".

When Rust introduced ADTs they were recognizably a concept from functional programming. It's a place or community of practice, not a purely descriptive adjective.

eyegor 2289 days ago

What they mean to say is: when I was working with a language that enforced pure functions, I had to actually consider purity. It's rare to see a way to enforce purity in procedural languages, whereas most fp langs support it.

nixpulvis 2289 days ago

Are we talking about even roughly the same concept of functional purity [1]? Nothing is stopping a pure function from representing EOF as -1.

Implementing IO in a "pure" way, is however another discussion.

[1]: https://en.wikipedia.org/wiki/Pure_function

eyegor 2288 days ago

Mostly, do you know of a single procedural language with a concept of IO monads in its stdlibs?

chrisseaton 2289 days ago

> If by procedural you mean, nonsense, then sure

Why are you being snarky?

They clearly mean the issue of modelling partial functions which would normally be done by a side-effect in a procedural language but can’t in a functional language.

nixpulvis 2289 days ago

No, they imply that the handling is done by returning a negative number.

I'm being snarky, as is my nature, to highlight the madness of a function called `getchar` returning anything but a `char`.

samatman 2289 days ago

It's not a great snark given that the C standard considers the signedness of char to be implementation defined, making -1 a valid option, sometimes.

nixpulvis 2289 days ago

I'm sorry you don't find it great (I still do). Integers are not characters.

Integers are numbers like -1337, 0, and 42.

Characters are things that compose strings of text.

These are not the same kind of thing at all. Just because APIs may be leaky, and some of these APIs are held in very high regard doesn't change that fact.

astrobe_ 2289 days ago

In the end, integers, floating point numbers, "text\n", emojis etc. are just sequences of bytes. You choose to acknowledge it and take advantage of it, or you don't.

samatman 2289 days ago

a char isn't a character, though. you can't add two characters together and get another character. it's a number.

getchar() gets a char. not a character.

chrisseaton 2289 days ago

> the madness of a function called `getchar` returning anything but a `char`

It’s effectively returning a Maybe(char).

nixpulvis 2289 days ago

But it's not.

A `Maybe<char>` has exactly one `None` variant. While an `int` has many, many negative values.

Also, just calling it `None` (or similar) makes clear what is meant, while `-1` is some magic value.

chrisseaton 2289 days ago

> while `-1` is some magic value

It's a documented return value. Nothing magic about it.