Hacker News new | ask | show | jobs
by ktRolster 3206 days ago
shrug It doesn't fall over. I've done it, the openBSD team has done it. DJB has done it. Maybe something is wrong with your implementation that I can help you with?
1 comments

I'm curious. Got links?
OpenBSD takes a fairly minimalist approach, which is vaguely described here: http://www.freebsdforums.org/forums/showthread.php?threadid=... They basically replace the unsafe functions with things that are easier to use. Their idea is that it isn't the format of the C-string that causes security issues (null-terminated string), it's the poorly defined functions (with weird corner cases that are hard to get right). It's worked well for their use cases.

DJB did something similar in qmail, I don't recall the details but you can look at the source code as easily as I can, and it eliminated security problems.

When I'm working in Java, I find that most of my string parsing uses the split() function. This is a pain in C, because even if you had a split() function you'd need to deal with memory allocations. Most of these are solved with a memory pool. In my own library, I also added runtime, grammar-based parsing functionality. So to parse a CSV line you might do something like this:

    char *g = " S   -> WORD | WORD , S;"
              "WORD -> [^,]";
    results = parsegram(g, inputString);
Grammar parsing + memory pools makes string parsing in C easier than in Java. The biggest difficulty with this kind of library is to do it right, you need to be something of a unicode expert, and that's tough.
I used snprintf(), too, but it is only a minor improvement. Problematic in C is something as simple as concatenating strings:

    Mystring s,t;
    t = "hello";
    t = cat(s,s);
    t = cat(s,s,s);
    t = cat("hello",s);
    t = cat(s,"world");
    t = cat("hello","world");
Even such a simple use case is fraught with major problems:

1. who allocates needed memory?

2. who free's it?

3. can the compiler constant fold cat("hello","world") ? Does the result wind up allocating memory anyway?

4. what about the lack of function overloading to handle the permutations?

Here's roughly what that would look like using Bernstein's C string library (which was not only used in qmail).

    #include "stralloc.h"
    ...
    static stralloc s, t;
    ...
    if (!stralloc_ready(&s, 0)) die_nomem();

    if (!stralloc_copys(&t, "hello")) die_nomem();

    if (!stralloc_copy(&t, &s)) die_nomem();
    if (!stralloc_cat(&t, &s)) die_nomem();

    if (!stralloc_copy(&t, &s)) die_nomem();
    if (!stralloc_cat(&t, &s)) die_nomem();
    if (!stralloc_cat(&t, &s)) die_nomem();

    if (!stralloc_copys(&t, "hello")) die_nomem();
    if (!stralloc_cat(&t, &s)) die_nomem();

    if (!stralloc_copy(&t, &s)) die_nomem();
    if (!stralloc_cats(&t, "hello")) die_nomem();

    if (!stralloc_copys(&t, "hello")) die_nomem();
    if (!stralloc_cats(&t, "world")) die_nomem();
Yes, that does work. But it's not without problems, not the least of which it's just not attractive to look at. For example, concatenating "hello" and "world" allocates memory, when it should instead give you a "helloworld" string literal. In fact, simply initializing `s` with a string literal needlessly allocates memory, and that's anti-ethical to performance. Calling die_nomem() leaks memory if it does anything but terminate the program. All those tests for memory exhaustion are tedious.
> Even such a simple use case is fraught with major problems: > > 1. who allocates needed memory? > > 2. who free's it?

That's also a major feature. It allows people to write systems that are resilient in the face of tight memory limitations. It's not cool when a language forces string operations to allocate & duplicate memory willy-nilly.

> 3. can the compiler constant fold cat("hello","world") ? Does the result wind up allocating memory anyway?

I fail to see how that's a major problem. Why are you concatenating string literals? How common is that?

> 4. what about the lack of function overloading to handle the permutations?

I consider lack of overloading to be a feature. Overloading is one of the things that are way too easily abused, and it makes code auditing harder than it needs to be. Please just type out the different function names so I can see exactly what is going to be called when I read the code. Or use the sprintf family of variadic functions.

It's the opposite. I've seen lots of code written in C that pretends to be out of memory safe. I've never once seen such a program that actually is out of memory safe. Invariably the codepaths triggered by malloc returning null are never exercised.

With a GC and exceptions you can theoretically be quite resistant to OOM conditions, not that anyone really cares.

> I've never once seen such a program that actually is out of memory safe. Invariably the codepaths triggered by malloc returning null are never exercised.

sqlite takes care to correctly deal with out of memory conditions. It has explicit tests for that code too. See section 3.1, Out-Of-Memory Testing, of [1].

[1] https://sqlite.org/testing.html

One of the things with tight memory systems is that you don't use malloc to begin with, if you can avoid it. C gives you the option.

When you're concatenating strings, you already have storage for those strings. Maybe you can re-use that storage. Maybe you have a static buffer. Maybe you have a fixed size buffer on the stack and the stack use is bounded.

A language that forces you into making redundant duplicates onto the heap is terrible in these situations.

And yes there are programs that try to deal with failing mallocs. Again, C gives you the option.

Very, very few C programs can handle running out of disk space. This includes the operating system(s). Get close to filling up the disk, and try various things.

Just recently, I was having a lot of trouble with Windows Update hanging. I finally noticed that free disk space was low. Freed up more space, and WU started working again.

For fun, try:

    #include <stdio.h>
    int main() { printf("hello world\n"); return 0; }
and redirect stdout to a file on a device that is full. Amazingly, it succeeds!
I assume you're referring to OpenBSD here, they didn't use snprintf(). They used asnprintf(), which solves the problem of who should allocate (but not who should free).
From the link:

"That means that we have been going through the tree cleaning out all calls to sprintf(), strcpy(), and strcat(). Instead, these things are being rewritten to use asprintf(), snprintf(), strlcpy(), and strlcat()."

Maybe the author made a typo.

Oh yeah, you're right.

Another thing I've done that will work if you have a lot of strcat(), is make a string struct:

    ktString {
       int len;
       int memlen;
       char *str;
    }
It keeps track of the string's actual length, and the size of the underlying buffer. Then you can 'override' the various string functions:

    bool ktStrcat(ktString s1, ktString s2);
    bool ktSprintf(ktString s1, ...);
These functions will take care of buffer-size checking, and reallocation if necessary. For cases where you need to interface with pre-existing libraries, you can return a cstring(). Make it a function/macro to enable you to change the struct definition in the future:

     #define ktCstr(x) (x)->str

then you can pass it into write() or whatever you need:

    write(sock, ktCstr(s), s->len);