Hacker News new | ask | show | jobs
by tba 5273 days ago
I'm confused. What is the "defect" in K&R's "copy(char to[], char from[])" function?

The author notes that "the second this function is called...without a trailing '\0' character, then you'll hit difficult to debug errors", but no function with that signature could possibly work in this case.

The built-in "strcpy" function has the exact same limitation. Does the author have a problem with it as well? Null-termination is a fundamental concept of C strings; there's no reason to shield C students from it.

The other example of "bugs and bad style" in this "destruction" of K&R C is a minor complaint about not using an optional set of braces.

I hope the remainder of the [incomplete] chapter demonstrates some actual bugs in the book's code, because it currently doesn't live up to the first paragraph's bluster.

5 comments

It's a nice strawman, right? Especially when he points out that the original code, in context, is perfectly fine. His later complaint about the assignment-in-if statement is certainly something shared by modern C programmers (see compiler warnings about same), but it perfectly fits the original style and accomplishes its task.

His criticisms seem to be rooted so far in stylistic issues and in taking the code out of context (design context, usage guarantees, etc.). Then again, how are you to nerdbait while still being fair to original sources?

My criticisms are only partially stylistic, but also that people copy this code to other situations and it breaks. So, I'm showing them why it only works in that one specific context and how to break the code.
If I can make a suggestion, then. I apologize if you've already covered this elsewhere in your book, but:

Please encourage the use of the "restrict" keyword to encourage proper aliasing optimization by compilers.

Also, would it be worth considering, perhaps in a chapter on how-to-deal-with-C-style-strings-if-you-must, including a section on writing compiler-independent macros to emulate safer calls like the str*_s variants in VS. These functions are usually slightly-different in incantation across VS and GCC, so it might be helpful to abstract them. It also is a relatively simple exercise in preprocessor macros with a clear benefit.

The built-in "strcpy" function has the exact same limitation. Does the author have a problem with it as well?

Yes. From the linked chapter:

we avoided classic style C strings in this book

From an earlier chapter on strings:

The source of almost all bugs in C come from forgetting to have enough space, or forgetting to put a '\0' at the end of a string. In fact it's so common and hard to get right that the majority of good C code just doesn't use C style strings. In later exercises we'll actually learn how to avoid C strings completely.

This is the author's opinion, of course – it's from a book, that should go without saying – but it's not as if the idea of avoiding C strings in general, and "strcpy" in particular, is an oddball or unique point of view. See e.g.:

http://stackoverflow.com/questions/610238/c-strcpy-evil

"the majority of good C code just doesn't use C style strings..."

Nice. In the 23 years I have worked on C language products, I've never worked on "good C code" by this definition.

The cool thing about this guys book, I guess, is that by avoiding all the things about the language he doesn't like, any reader will be wholly unprepared for C in the Real World after this book.

This chapter is explicitly teaching people what the author's idea of bad C code is: It has them read some, then tells them specifically what the gotchas are, then asks them to code up test cases for the flaws and run them through Valgrind.

Where's the "avoiding" here?

And which of the skills being exercised - imagining what kinds of bad things could happen, writing executable test cases, detecting segfaults - are not useful in the real world?

That's basically the point of this chapter. It's getting people to think like a hacker and try to break the code in unintended ways. That makes them better programmers and helps when avoiding common mistakes in C code.

Using K&R to do this is to give people a set of known good C code samples and show how even those can be broken and misused.

Noble goal, but the way the chapter is written doesn't really come across that way, in my opinion.
Speaking as someone who doesn't know C, hasn't read K&R, but who (usually!) has decent reading comprehension, it _does_ come across this way.

He's completely explicit about what he's doing and why. I don't understand how people can read that and still feel that he's being unfair.

Maybe I'm just too literal-minded.

How about being able to work on a team of other programmers without insisting that everything ought to be done you own, completely unique, way?

If he truly had a better, more correct, way to write C code I would join him in trying to change the world. But the first example he gives is just factually wrong.

"we avoided classic style C strings in this book"

There's the avoiding. The skills being exercised are not a problem, it's the skill not being exercised that is a problem. You cannot be a competent C programmer without understanding strings and their relevant library functions.

Yes, they'll be pretty good at C unlike you because they'll read the whole book instead of just this one half-finished chapter.
"they'll be pretty good at C unlike you"

Thanks for the code review of my last 23 years of work on C projects. Nice ego that your book must be the only way to learn this.

I mostly disagree with your statement about "most good C code" not using stdlib strings/apis.

To be fair, from your comments here, it looks like you aren't really saying that -- you are saying we have to be careful when using them. This I know, and carefully guard against, and I do try to break my code regularly. But to assert that "most good C code" doesn't use them fails to meet with my experience.

That said, _all_ bad C code (ab)uses them, probably due to carelessness and/or misunderstanding of how things work in C.

The approach looks similar to the one in "JavaScript: the good parts" I am worried about the same while reading it: ok, here's the right way to do e.g. inheritance, but how do they do it in real world?
> unprepared for C in the Real World

Oh please. Most C in the "Real World" is utter crap. It doesn't mean there is no place for books teaching "alternative" approaches.

I didn't see it yet from a quick skim of the available chapters, but I'm going to guess that his preferred alternative is 'bstr - The Better String Library' - http://bstring.sourceforge.net/

I've used it the past, and I try to include it in any project that isn't immediately trivial, although the to/from integration with just about all library code ever is a bit of a hassle.

It's not just that the to/from is a hassle, it's also that you have to deeply understand the gotchas of ASCIIZ strings to safely do that bridging at all --- so pretending it doesn't exist is probably not a good teaching strategy. (I have no opinion about the article and am only commenting on this thread because I like C).
I do not pretend anything about null-terminated strings in the book. In fact, I have many exercises and assignments where they use Valgrind to buffer overflow C strings repeatedly and don't introduce bstring until much later.
Sorry, I haven't read your book, meant to be clear that I was talking in the abstract, but I can see after rereading I wasn't clear enough.
I use bstring but there's a few others. Generally it's the easiest way to avoid some common buffer overflows and other problems.
> but no function with that signature could possibly work in this case.

This is the source of the bugs in C. People write functions that only work given all calls to them are never changed, which is absurd. Good modern C code involves trying to protect against bad usage and adding defensive checks.

So yes, the built-in strcpy is crap which is why most competent C doesn't use it except in a few rare cases where it's required.

And this does demonstrate actual bugs in the code. I wrote a test case that causes it, which incidentally is a common bug in C code called a buffer overflow. It's because of code examples like this that get copied to other situations that we have these defects.

High level string libraries are a win.

But you may be overstating your case a bit.

From my codebase/third-party directory on my laptop (a bit random, I admit), from those projects I'd consider "competent C" (ie, not OpenSSL or MRI ruby):

* dovecot uses ASCIIZ strings and libc string functions

* redis uses ASCIIZ strings and libc string functions

* nginx uses a high-level buffered string library

* lcamtuf's skipfish scanner uses ASCIIZ strings and libc string functions

* libevent uses ASCIIZ strings and libc string functions

* qmail uses djb's string library

* memcached uses ASCIIZ strings and libc string functions

It's probably good to be comfortable with both approaches.

I don't know that you actually made this claim, but you seem to have given people here the impression that you believe functions that work with ASCIIZ strings should be bulletproofed to handle non-ASCIIZ inputs. I couldn't agree with that argument, especially as an argument about K&R's code being rusty.

People here are jumpy though (they're commenting, like me, mostly because they're bored).

Looking forward to more examples from the book.

Redis strings are actually length prefixed null terminated strings
The string data type in the db? Because Redis itself uses char-stars all over the place, with the libc string functions.
Hmm, reading the source it looks like he is using His sds string library, which has Len, size and a asciiz char* member. When last I checked he does pass the char* around (because it is null terminated) but he also sometimes will do pointer math to get back to the sds
You're right; I was reacting to the count of char\s+star and snprintf calls, but only fair to chalk Redis up among the packages I have that rely on a high-level string library.
this from the man who won't describe xargs to newbies because he doesn't understand it.

blargh

Huh? My C programming is in the distant past, so I might be forgetting. But strcpy does assume a terminal zero, doesn't it? E.g., http://linux.die.net/man/3/strcpy. It sounds like you are talking about strncpy or memcpy.
Yes, strcpy is almost universally understood as a security problem. In fact, here's a nice regex to apply to some C code to find buffer overflows:

LC_ALL=C egrep '[^_.>a-zA-Z0-9](str(n?cpy|n?cat|xfrm|n?dup|str|pbrk|tok|_)|stpn?cpy|r?index[^.]|a?sn?printf|byte_)' src/*.c

Taken from the really well researched and secure andhttpd:

http://www.and.org/and-httpd/#secure

Run that regex on some C code, then go look at how the inputs to those functions are used, and then you can probably create some of your own buffer overflows. It's like magic.

I'm confused by your comment. strcpy assumes that the string to be copied ends with a NUL. The case described in the link violated that assumption and caused a segfault.
To be pedantic, it's not that strcpy is making assumptions.

Its interface is defined such that it is explicitly invalid to pass it some other garbage.

To be both pedantic and correct, there is no way to define a C function to restrict a string input so that it is correctly terminated. So no, it's at best documented that you shouldn't do that and definitely doesn't prevent you from doing this.
To be still more pedantic, there are two kinds of definition of an interface: definition-within-the-type-system and definition-within-the-documentation. The string library is specified in the ISO C standard (I use C99, but you can be all hip and C11 if you want), and passing an unterminated string to strcpy is a constraint violation.

7.1.1 A string is a contiguous sequence of characters terminated by and including the first null character.

7.21.2.3.2 The strcpy function copies the string pointed to by s2 (including the terminating null character) into the array pointed to by s1.

Therefore, code which passes an unterminated string to strcpy is not conforming code (because s2 does not point to a "string" as C99 defines it).

Of course, you should use strncpy anyway. But that's not the point.

/me spends too much time in comp.lang.c :S

Right, the C language type system makes no attempt at describing that kind of constraint, yet it is still clearly defined within the language.

And it's not just the standard C library that defines it...this definition of string is also supported by the literal string data type.

The other time that there will be an issue is when your to is shorter than your from...