Hacker News new | ask | show | jobs
by rectang 2294 days ago
Like NULL, confusion over EOF is a problem which can be eliminated via algebraic types.

What if instead of a char, getchar() returned an Option<char>? Then you can pattern match, something like this Rust/C mashup:

   match getchar() {
     Some(c) => putchar(c),
     None => break,
   }
Magical sentinels crammed into return values — like EOF returned by getchar() or -1 returned by ftell() or NULL returned by malloc() — are one of C's drawbacks.
7 comments

What always annoyed me about C is that it has all the tools to simulate something approaching this, save for some purely syntactical last-mile shortcomings. We can already return structs; if only there were a way to neatly define a function returning an anonymous struct, and immediately destructure on the receiving end. Something like:

  #include <stdio.h>

  struct { int err; char c; } myfunc() {
    return { 0, 'a' };
  }

  int main(int argc, const char *argv[]) {
    { int err; char c; } = myfunc();
    if (err) {
      // handle
      return err;
    }
    printf("Hello %c\n", c);

    return 0;
  }
This is (semantically) perfectly possible today, you just have to jump through some syntactic hoops explicitly naming that return struct type (because among others anonymous structs, even when structurally equivalent, aren't equivalent types unless they're named...). Compilers could easily do that for us! It would be such a simple extension to the standard with, imo, huge benefits.

Every time I have to check for in-band errors in C, or pass a pointer to a function as a "return value", I think of this and cringe.

You can write that in C++17 with only slightly different syntax (and it is actually really nice for being C++):

  #include <stdio.h>
  #include <tuple>

  std::tuple<int, char> myfunc() {
    return { 0, 'a' };
  }

  int main(int argc, const char *argv[]) {
    auto [ err, c ] = myfunc();
    if (err) {
      // handle
      return err;
    }
    printf("Hello %c\n", c);

    return 0;
  }
You may be interested in tagged unions. A struct with an enum and a Union. You can switch on the enum.

More stuff like this in https://pdfs.semanticscholar.org/31ac/b7abaf3a1962b27be9faa2...

Sounds like you'd like Go, which works this way.
Which is a strictly inferior and botched way to go about it, especially since golang was designed from scratch.
> We can already return structs

AFAIK, no? You can return a pointer to a struct, and you can pass whole structs as arguments, but not, IIRC, return them from functions.

EDIT: Apparently you can, sort of, but not portably; how exactly it is defined to work depends on the compiler, and each compiler might define it differently. This means that if you’re using a library which returns a struct and your program use a different C compiler than the library used when it was compiled, your program will not work. I.e. there is no one defined stable ABI for functions returning structs.

Therefore I think it’s reasonable to regard it as impossible in practice.

Structs are values and you can return them like any value (or use them as parameters).

I'm not sure what you mean about compilers.

Xe is conflating compilers and calling conventions a bit. The way that structure types are returned varies by calling convention, as indeed do a lot of other things. Mismatched calling conventions leads to problems.

But structure type return values are well specified for most calling conventions, and quite a number of compilers support explicitly specifying the calling convention for mixed-language or mixed-compiler situations.

* http://jdebp.uk./FGA/function-calling-conventions.html

Many calling conventions apparently use a method for returning structs which is inherently non-thread-safe.

Also from that link:

> 32-bit cdecl calling convention

> For return values of structure or class type, there is wide incompatibility amongst compilers. Some make the return thread-safe, by breaking compatibility with the 16-bit cdecl calling convention. Some retain compatibility, at the expense of their 32-bit cdecl calling convention not being thread-safe. The ones that break compatibility don't all agree with one another on how to do so.

Oh, I get it.

This is mostly not a practically relevant issue. (Nor are pre-K&R compilers relevant, although something like this could arise among modern compilers.) As far as oddball situations go, it's far from the thorniest to deal with - it doesn't even involve C++.

”What if instead of a char, getchar() returned an Option<char>?”

Getchar doesn’t return a char; it returns an int (https://en.cppreference.com/w/c/io/getchar).

⇒ if C didn’t do automatic conversions from int to char, we would have that (in a minimalistic sense)

That wouldn’t work for ftell and malloc (and, in general, most of the calls that set errno), though.

> Getchar doesn’t return a char; it returns an int

Dammit, I knew that. Thank you for flagging my blunder; being precise is really important in this case. The Linux manpage better explains the return value of getchar:

https://linux.die.net/man/3/getchar

"fgetc(), getc() and getchar() return the character read as an unsigned char cast to an int or EOF on end of file or error."

getchar() needs to return an object the width of an unsigned char, but all the values in that range are taken by possible character values. The return type had to be expanded to int in order to accommodate the sentinel.

The alternative of using an algebraic type is superior because the end-of-stream condition has a different type (so to speak), and furthermore, the programmer has no choice but to deal with it because the character value comes wrapped inside an Option which must be stripped away before the character value can be used.

Really, you also want the type system to express all possible error conditions as well, since getchar() returning EOF can mean either that end-of-file was reached or that some other error occurred!

As someone who has written lots of C code and worked hard to account for all possibilities manually, I really appreciate it when the type system and APIs can express all possibilities and back me up.

> Magical sentinels crammed into return values — like EOF returned by getchar() or -1 returned by ftell() or NULL returned by malloc() — are one of C's drawbacks.

They're part of the C standard library. The POSIX I/O APIs don't have these problems. The Linux I/O system calls are even better because they don't have errno.

Honestly, the C standard library just isn't that good. Freestanding C is a better language precisely because it omits the library and allows the programmer to come up with something better.

I think that's being too kind. The C standard library is terrible.
To be fair, the libraries found in other languages aren't much better. Ruby's standard library was the most comfortable in my experience but it still has glaring flaws.
So `read`'s `Ok(0)` result, is akin to `getchar`'s `None` result here. A different API causes a little more to consider, but generally makes sense.
Option<u8>, given that in C ‘characters’ means bytes, not code points.
A byte could be 6,7,8, or 9 bits depending on platform.
Yes, but Rust doesn’t support those. So on platforms where both C and Rust run, bytes will be 8 bits.

Either way, no platform defines bytes to be Unicode code points.

Or allow multiple return values like Go. EOF gets returned as an explicit error value and the io.Reader interface is standardized and widely used.
It seems weird that Go considers EOF to be an error condition. Reaching the end of a file is a normal, expected outcome of reading files.
By making error values explicit and handling them necessary, Go makes all error conditions expected outcomes.

Whether this is an advantage is heavily domain dependent.

golang's approach is inferior and error prone. We already have better designed languages.
> What if instead of a int, getchar() returned an Option<char>?

That would be the textbook case of stupid over-engineering.

I strongly disagree. The existing getchar() API is not simple at all! All the possible error conditions are still there, they're just obscured by an overstreamlined API which fuses them inappropriately into a single return type. That makes it harder to handle all cases well, because you have to do all the work manually.
The man page for getchar is a single, easy to read paragraph. To understand algebraic types you need a couple of textbooks.
No, encoding additional information in unused bits of an int that you return is stupid over-engineering that needs multiple textbooks to grok. Option<char>, on the other hand, is the simplest possible solution for this problem.
I would also like to not that encoding the the Option<char> using the unused bits of the return value is a perfectly valid implementation. But that is exactly what it is, an implementation detail. It could work exactly the same way as today but the programmer wouldn’t have to care about how it was implemented, just whether they got a char or None.
In fact, that's exactly what Rust already does today! Option<char> uses the exact same amount of bits to store as plain char, because the compiler has enough information to encode the Option-ness of char in what it knows is a garbage bit of the underlying type.

https://play.rust-lang.org/?version=stable&mode=debug&editio...

Rust guarantees this, actually.
> No, encoding additional information in unused bits of an int that you return is stupid over-engineering that needs multiple textbooks to grok. Option<char>, on the other hand, is the simplest possible solution for this problem.

What kind of wicked education you had for this to be the case?

My dad taught me about bits and bytes and words when I was a kid, and by 16 I had a quite solid grasp of it (without any textbook). Then I studied several years and got a phd in applied math (mostly numerical pde, and that involved a lot of programming). Then I have spent 15 more years doing math and programming in several languages (mostly C and Python) and getting paid for teaching data science and signal processing to people who got on to have fruitful jobs in industry. Today, I read the wikipedia page about "option type" [1] and the one about about type theory [2], which seems a prerequisite, and couldn't understand a word.

[1] https://en.wikipedia.org/wiki/Option_type

[2] https://en.wikipedia.org/wiki/Type_theory

Surely if you have a PhD in applied math you've seen that Wikipedia will often foreground dense theoretical issues, even for topics with straightforward practical applications.

You do not need to understand theoretical type theory to understand options. It's just like a pointer that can be NULL except the compiler makes sure you can't accidentally dereference it if it is. Algebraic data types in general are basically just structs and tagged unions, except the compiler makes sure you can't screw the tags up.

Like, dude, by your own account, you're pretty smart; that's the point of your last paragraph, right? There are, at this point, hoards of Rust and Scala and Swift and Kotlin programmers who can figure out how option types work, and don't seem to have too much of a problem with it and pretty much universally think they're great. Are they actually just smarter than you?

I’m a community college drop out and have never taken a CS class and I use options in my code all the time.

It’s just a wrapper around some value that is either Some(value) or None and you need to unwrap it and handle both possibilities for your code to compile.

You don’t need to know anything about monads or ADT’s to understand it.

You can amortize that cost over all of the problems in the language's domain, not just getchar.