Hacker News new | ask | show | jobs
by orthoxerox 29 days ago
I wasn't in this class myself, but one prof at my alma mater started his "Programming 201" class with the simplest assignment: write a C program that accepts two integers from the user and prints their sum. It actually was the only assignment for the rest of the semester, since he has a test suite that would humiliate the students gently at first, but would ultimately pipe a billion nines into stdin as the first argument.
6 comments

It's a little awkward, because you'd need to parse the strings in reverse, but if all you need to do is sum, you can do it one digit at a time, while at any given moment only handling only one character from each input string, a carry byte, and one output character.
You don't need to parse the strings in reverse. That's for printing integers, not parsing. Roughly:

    int stdin_atoi() {
      int i = 0;
      while (1) {
        int c = getchar();
        if (c >= '0' && c <= '9') {
          i = i * 10 + (c - '0');
        } else { break; }
      }
      return i;
    }
That covers the ‘int’ case, but not the ‘integer’ case described. Unless you have unlimited memory, you’ll need to go least- to most-significant digit; but you’ll need to do that on both inputs, which doesn’t really work with the interface described unless at least the first argument first in memory all at once, so… well, I assume “I under specified this problem and it’s impossible” is the point of this sort of exercise.
That method requires storing an arbitrarily large number, whereas for the least-significant-character-first method, the math itself could be done without using any more data than two input bytes, one of which could double as an output byte, and a carry byte.
How do you know where the first string ends and the second starts? Did you miss the "stdin" part?

This is not

    ./program first_number second_number
Of course the method described requires both input and output buffers, because everything is processed last-character-first.

Now that you mention it, if the assignment had called for arguments, instead of files or pipes, argv points to a writable array, so the result could be written directly to it, negating any need to allocate memory, and any out-of-memory conditions from large input data would occur before the program is even called.

If it usually uses a file to store the numbers, the same could be done by writing the result back to the file, but that only works if it is passed as an argument, as piping it would throw a seek error. I wonder if the instructor would accept an interleaved little-endian input syntax, with a little-endian output; then the program could use pipes without a need to seek. An infinite series of '9' characters would output an '8', followed by one '9' per two input characters.

This would be kind of a fun challenge. If you are handling random numbers, well you are limited by disk or memory size. But if the numbers are compressible ala LZ77 or Gzip, then there are ways to use the value’s compression trees to sum the numbers from the least significant digits using the LZ77 style compressed value tree representation. If you go that route, and the numbers are compressible (not random) then the question is whether the compressed input and output trees fit in memory or disk.
Would be fun to write a program that arranges to send the input into dc(1) and just outsource the whole problem to Ken or Rob or whoever wrote it. :)
It would be fun, but were I the teacher I'd commend you for your ingenuity, and then ask you to return to your desk to complete the assignment.
That’s golden!

Would make an excellent “interview question from Hell”!

Perfect is the enemy of good.
That's true for product development, but it's not true for mathy libraries. Perfect is achievable. For a released software that humans will decide to use or not, rapid iteration is great. But also: https://randomascii.wordpress.com/2014/01/27/theres-only-fou...

Precision and exactitude and formally proven correct software can exist in some problem domains, and it's kind of silly to not achieve that when it's achievable.

But in this case, C is not "good". It is more like "abysmal". "Good" is just producing a correct result or error, with no ambiguity which case applied and no UB. "Perfect" is arguing over the most usable and elegant API for it.
Once a program is available over the internet, hackers are the enemy of merely good programs that don't perfectly validate their input.

"You have to get lucky every time. We only have to get lucky once".

Sure
Could you humor a coding noob--how do you deal with utterly insane inputs like that?
You first ask if you really need to.
Unless you're exposing it to the internet, ever, in the entire future history of the program. Then you kind of have to, in one form or another.
You have to, but you probably shouldn’t do it by trying to add the inputs. That opens a door for DDOS attacks.

Returning an error on inputs that are too long (for some definition of it) is the way to go.

Arbitrary precision arithmetic (GMP, BigInteger, etc). Numbers can take arbitrary amounts of memory, instead of just a single machine word.
At some point you can just refuse. Too many digits. Well time to quit with error.
Crash and report an error.
You report an error and exit cleanly with a proper operating system error code. Crashing is a quick hack, acceptable for throwaway projects but not in software used long-term.
Crashing (in the sense of "give up and exit with an error") on invalid inputs is valid (and often the best thing) in many cases.

Fix your inputs.

I think you're using "crash" to mean "exit early". I am using "crash" in the sense of "this program did something causing the OS to terminate it externally". I suppose that's a real point of difficulty in communication across different programming languages.

We agree that the program should exit early. I think we agree it should do it cleanly and intentionally. I'm adding the constraint that "crash" doesn't necessarily mean "cleanly and intentionally", especially when talking about a C program.

There's a position in between "exit cleanly" and "general protection fault, core dumped" where the process essentially does the internal equivalent of SIGKILLing itself.

I.e. either intentionally (e.g. tripping an assertion failure), or accidentally due to some logic-failure in exception/error-handling, the process ends up calling the exit(3) syscall without first having run its libc at_exit finalizers that a clean exit(2) would run; or, at a slightly higher runtime abstraction level, the process calls exit(2) or returns from main(), without having run through the appropriate RAII destructors (in C++/Rust), or gracefully signalled will-shutdown to managed threads to allow them to run terminating-state code (in Java/Go/Erlang/Win32/etc), or etc.

This kind of "hard abort" often truncates logging output at the point of abort; leaves TCP connections hanging open; leaves lockfiles around on disk; and has the potential to corrupt any data files that were being written to. Basically, it results in the process not executing "should always execute" code to clean up after itself.

So, although the OS kernel/scheduler thinks everything went fine, and that it didn't have to step in to forcibly terminate the process's lifecycle (though it did very likely observe a nonzero process exit code), I think most people would still generally call this type of abort a "crash." The process's runtime got into an invalid/broken state and stopped cleaning up, even if the process itself didn't violate any protection rules / resource limits / etc.

Interesting difference in nomenclature. For me, "crash" absolutely includes intentional early termination. A Rust panic, for example.

I think that that's by far the dominant usage of crash. It would surprise me if someone used the word crash but intended to exclude panics, etc.