Hacker News new | ask | show | jobs
by leodeid 2911 days ago
I haven't touched C in years, but here's my descending "wtf" list:

1. Returns pointer to stack-allocated data, which immediately becomes invalid. Instead, it should be using some sort of allocation (e.g. 'malloc'), or taking in a destination pointer.

2. 'r' is arbitrarily set with length 100. Smaller strings don't need all that space, and larger strings definitely will overrun.

3. The function signature is really awkward. Without any of the surrounding textbook content, I'm not sure what behavior is supposed to happen. At first, I expected something like 'strcat', which takes two char* and appends the second one to the first one. But that isn't happening here and instead it seems to require dynamic allocation. (Hiding allocations inside a function is generally kind of weird. Usually the caller should be responsible for passing in a handle to the destination.)

4. There's no sensical limit on the loop iteration. If the input 't' doesn't have a null terminator, this is going to throw a ton of garbage into the stack space (because 'r' is stack-allocated to a fixed size). And also maybe run for a really long time.

5. 'strcpy' should usually be replaced by 'strncpy', which performs the same function but also requires you to provide a limit ("copy this string, but at most 'n' bytes"). That prevents a class of exploitable errors known as "buffer overruns". I don't know when the 'n' string functions were added to C or became popular, though.

This is a teaching exercise, so the fact that this is implemented as a separate function instead of calling 'strcat' from <string.h> doesn't seem like a big problem.

6 comments

> 'strcpy' should usually be replaced by 'strncpy'

Sorry to butt in, but this is a bit of a trigger for me: I’ve had to fix a number of programs infected with this idea.

The main problems with strncpy are:

When the source string is shorter than n, strncpy will pad the target to n bytes, filling with zeros. This is bad for performance.

When the source string is longer than n, strncpy will copy n bytes but _not_ nul-terminate the target. So you need extra schenanigans every time you use it to cover this case.

So strncpy is hardly ever a good idea. Sadly there is no standard replacement that is widely accepted. More details at https://en.wikipedia.org/wiki/C_string_handling#Replacements

I agree with you completely, and in general think the whole idea of using "safe" string functions with built-in buffer length checking is wrong because it is a solution to a symptom, not a cause.

Before writing to the buffer you should've ensured that it's big enough, and decided what to do if it's not, long before actually doing it. In other words, what happens if it's not big enough? These "always use $length_checking_function" proponents miss that point. Yes, you've avoided an overflow here, but chances are something was already too small long before the flow reached here, and the fix is not to replace an overflow with truncate/not copy/etc. here, but fix the check/sizing that came before elsewhere.

> Before writing to the buffer you should've ensured that it's big enough, and decided what to do if it's not, long before actually doing it. In other words, what happens if it's not big enough?

If you planned all this out, you're still making an assertion as to the length. The contract is "give me a string of this length" and if that's not enforced by the compiler, it ought to be enforced at runtime so that the error is detected and dealt with as soon as possible.

So maybe "safe string functions" should really be "fail fast string functions."

> Sadly there is no standard replacement that is widely accepted.

True, but practically: when having access to BSD extensions, strl* are used. If only C standard is available, snprintf is preferred. I have seen C libs that will check for strl* availability, and if not, reimplement them using snprintf.

So for portability, snprintf is the way to go. For correctness, and pushing for their extended use, strl* is nice.

The least bad options seem to be strl* or snprintf(..., “%s”, ...) but yeah nothing is perfect.
There's strlcpy, but it's not part of POSIX unfortunately.

  #define strlcpy(d, s, n) snprintf(d, n, "%s", s)
Not quite the same (different return type) but close.
> 5. 'strcpy' should usually be replaced by 'strncpy'... That prevents a class of exploitable errors known as "buffer overruns".

To be honest, strncpy is barely better in this respect (as a security improvement) - truncating against arbitrary size limit in this day and age of text-only protocols... I wonder if outright crashing at the testing stage would be preferable rather than subtle misbehavior creeping into the release.

Both are bad IMO, the actual required buffer size should be known in advance.

Raw null terminated strings are just a bad idea. `std::string` and the bafflingly just-introduced `std::string_view` are the right way to handle strings. We can spare the bytes now.
> Raw null terminated strings are just a bad idea.

Absolutely agree. Scanning the memory until "we find it", potentially crossing boundaries between segments of memory with different characteristics (caching etc.) just doesn't seem right in general, and if I recall, some CPUs even used to have published errata related to that.

> std::string` and the bafflingly just-introduced `std::string_view` are the right way to handle strings.

I'd even go straight to custom implementation of Hollerith strings. Literals have lengths known at compile time, protocols would either carry the lengths alongside the strings, or be trusted (to have good strlen behavior) until they do, composite strings would compute the length out of the components, etc. This doesn't seem too complex to do, looking from from my bell tower, but I know many people here would frown upon mentioning C++ in the context of embedded development (my area).

I've mentioned elsewhere that strncpy is not a "safer" strcpy.

Even if it were, there's safety and there's safety. A function (like strncat, for example) that quietly truncates your data if it's too long isn't necessarily better than one that quietly ignores array overruns. Consider what happens if "rm -rf $HOME/tmpdir" is quietly truncated to "rm -rf $HOME/"

strncpy is not safe strcpy, for any value of safe. Period. str in the name is really misleading as it's not really a string function to begin with.
> If the input 't' doesn't have a null terminator

Then it's not a string.

Sure, but it is still a valid 'char *' :)
That's something C's type system doesn't check for. If you want protection for this case, use C++ or any other higher-level language instead.
Well to be pedantic C++'s type system doesn't check for that either it just passes around a size_t and char *.
I was referring to std::string, which is what you should be using if you're handling textual data natively.
The implementation of std::string is 99.999% of the time struct { char *s; size_t len; }, which has nothing to do with an actual type.
And it's even possible to use std::string as a buffer for binary data including NULs. I won't recommend it, but it works.
Also, it allocates ints x and y, saves a value to y, copies it to x, and only uses x from then on. y is entirely redundant.