> He asked: why is there no argument to memcpy() to specify the maximum destination length?
I'm confused by this. The third argument provides the destination length, so what good would a "maximum destination length" do? I guess he must mean that because the length is often computed, you'd need a fourth argument to ensure the length isn't greater than some sane upper bound. But you can easily fix that using an if statement around the memcpy.
which is effectively equivalent to your memcpy_oobp function.
However the Microsoft function also returns an error code which must be checked (because count might be larger than destSize), thus providing another way for the programmer to screw up. I'm not sure if this is better or worse than just copying the min() as in your second example. It probably depends on the situation.
Using min() seems like it could be incredibly dangerous as an "implicit" behavior, not to mention surprising.
I'd wager it'd be much better to just specify that abort() gets called in the "overflow" case. (Given that overflow is basically never what you want anyway.)
Yeah, it'll crash but at least it won't be suprising/undefined behavior.
For extra fun, the Microsoft implementation of memcpy_s returns an error instead of crashing if either of the pointers is NULL (thankfully doesn't apply if the copy size is 0). There's a reason I don't like writing software for Windows ...
Thankfully, compiler warnings and static analyzers have become much better in recent years. For instance, gcc can now warn about a missing 'break;' mentioned in the article (you need to add a special comment like '/* fall through */' if it's intentional). Also, clang-tidy is getting better with each release. I highly recommend using it, although the initial configuration will take some time, depending on the code base.
Alas! strlcpy and strlcat are still not present in the glibc, despite numerous attempts, mainly for religious reasons (ie. "BSD sucks").
And yes, having something like "if (strlcat(buffer, src, sizeof(buffer) >= sizeof(buffer)) { abort(); } " is much better than buffer overrun. But security does not always seem to be a real concern, compared to politics.
C is dangerous partly because assembly language is dangerous. We will always need some layer on top of assembly that is mostly unchecked and reflects back to how cpu instructions work. This is probably something we must live with until we have processors with the notion of type checking.
C is dangerous partly because of swaths of undefined behaviour and loose typing. Eliminating much of undefined behaviour either by defining the behaviour or forcing the compiler to refuse compile undefined behaviour could be of some help. There are still classes of undefined behaviour that cannot be worked around but narrowing that down to a minimal set would make it easier to deal with it. Strong typing would help build programs that won't compile unless they are correct at least in terms of types of values.
C is dangerous partly because of the stupid standard library which isn't necessarily a core language problem as other libraries can be used. The standard library should be replaced with any of the sane libraries that different projects have written for themselves to avoid using libc. It's perfectly possible not to have memcpy() or strcpy() like minefields or strtok() or strtol() which introduce the nice invisible access to internal static storage, fixed by a re-entrant variant like strtok_r(), or require you to do multiple checks to determine how the function actually failed. The problem here is that if there are X standards, adding one to replace them all will make it X+1 standards.
Yet, good programmers already avoid 99% of the problems by manually policing themselves. For them, C is simple, productive, and manageable in a lot more cases and domains than it is for the less experienced programmers.
Ironically other systems programming languages developed outside AT&T walls since 1961 did not suffer from the majority of C's pain points regarding memory corruption.
I really wish Bell Labs had been allowed to sell UNIX.
Terrible title. It's not remotely news that C is dangerous. This talk seems to be about ways of mitigating the dangers. Why not call it "Mitigating the dangers of C" or something else that is less of a tired cliche?
I write a ton of C and I completely agree with the title. With 20+ years of experience.
Kernel drivers and embedded system bare metal firmware.
The problem with C is that in any bigger project something always slips through even the best programmers, reviewers, static analysis and unit tests. And that something can lead to disastrous crashes and security vulnerabilities.
Don't you think that with the tools we have now it's easier to control the quality of code produced (Clang memory sanitizers and so on)?
I feel more at ease to ship C code today after instrumenting it than a few years ago...
Tooling absolutely helps to reduce defects. That's why you use them.
That said, sometimes I'm shocked what kind of disasters get past the analyzers.
Stakes are higher than ever. It's not just about functional correctness and avoiding crashes anymore. Your code needs to be secure against outside world malicious actions. Getting rid of counterintuitive security vulnerabilities is very, very hard.
I would say that is why security conscious developers use them.
Sadly we are a very very tiny percentage, as proven by Herb Sutter question to the audience at CppCon (1% of the audience answered positively), and CVE frequent updates.
Because many on that list are well known FOSS projects that supposedly have such processes in place, including manual review before accepting patches into mainline, like the Linux kernel being discussed here.
> For embedded systems I mostly go with "dynamic memory allocation of any kind is evil" and that solves a lot of issues already.
Yeah, bare metal systems often don't allocate at all. Although one sin they often do commit is using same buffer for multiple purposes. What could go wrong...
Perhaps even more common is allocating a buffer on stack and writing past bounds somehow. Also DMA to/from stack is usually not a great idea...
Above things sound dumb, but can easily happen when you build your abstraction layers and use them carelessly.
That only eliminates a certain case of bugs. There are still plenty of foot-shotguns available - memcpy/memset, strlen, gets/puts, printf, any file IO, networking calls, etc.
This is my view as well, from the same industry. However, the quality of the tools available in C to deal with its issues far exceed those in any other language. I would love to drop C from all my systems, but the alternatives simply aren't there.
The alternatives were there before UNIX took over server room and workstation market.
Just imagine how many millions the IT industry and PhD research have spent developing solutions that would improve C's safety, many of them largely ignored by most C developers.
> The problem with C is that in any bigger project something always slips through even the best programmers, reviewers, static analysis and unit tests.
Well, I can say the same about Python, Erlang, Lua, in addition to C and C++. I believe C is not worse than these languages, only that C requires different (sometimes very different) skills and discipline.
I'm absolutely sure same skill level programmer will create less defects in Python, Erlang and Lua than in C. You really have to try to overwrite memory in those languages.
Of course you can shoot yourself into foot with stuff like metatables in Lua and Python metaclasses and whatnot. Then again you should see some C macro messes around...
Anyways I don't like when people defend C with that age old argument it requires a clever disciplined programmer that never makes mistakes. Because either such programmers don't exist or they're very rare.
Ok, that's a fair point. I don't have the evidence for that.
Scripting languages do have their pitfalls. Lua and python can have type mismatches and even typos causing misbehavior, things that usually aren't issues with C.
However, you do need significantly less code than in C.
hmmm... In my experience, I have had much less logic errors in C++ than in Python or JS because I tend to try to encode the domain logic into the types as much as possible, so that I can piggyback on the compiler.
I'm a C coder first and foremost and I strongly disagree with this mentality (even though I know it's extremely pervasive in our circles). "Footguns don't make bugs, coders do" is technically true but if we could keep the footguns at a minimum and only get them out of the locker when truly necessary instead of having them spread all over the place all the time I'm sure it wouldn't hurt.
C is a very useful language and one you basically have to know if you're interested in low level software but it's very, very far from flawless.
If you look at many high profile software vulnerabilities of late (heartbleed, goto fail, etc...) many can be traced to the lack of safety and/or bad ergonomics of the C language.
We need to grow up as an industry and accept that using a seatbelt doesn't mean that you're a bad driver. Shit happens.
C actually gives one rather limited control over modern hardware with it's memory hierarchies and superscaler CPUs. Programming language research has also moved on a lot since the 70's, which is why we should be considering less dangerous languages (e.g. better type systems and less undefined behaviour). Languages like ATS and Rust also support explicit memory management, whilst being a whole lot safer.
C alone doesn't provide the control directly, but you as a programmer can absolutely leverage C to take control of the memory hierarchies by controlling your data access patterns. IOW, high locality of reference.
Good C-compilers will most of the time take care of the superscalar CPU friendliness. When they don't, you can always drop down to the assembler level, and it'll mesh well with C.
High-locality of reference can be achieved in any language that supports unboxed types, it doesn't require C (even a very high-level language like Haskell has support for this). But this is a long way from having complete control how each memory heirarchy is used.
Likewise most static languages defer to the compiler for CPU-specific performance optimisations and will permit foreign native calls into C or ASM where necessary. So I don't see how this is an argument in C's favour.
> High-locality of reference can be achieved in any language that supports unboxed types, it doesn't require C (even a very high-level language like Haskell supports this).
You often also need correct alignment. Cache-line or page. Your unboxed access across two pages can cause two TLB misses, L1 misses etc. Not to mention two page faults.
Sometimes you need to ensure two (or more) buffers are NOT aligned in a particular way to avoid interfering with CPU caching mechanisms.
Yes. Currently access to modern hardware features are either via cumbersome APIs (e.g. NUMA, AVX intrinsics), handled via the OS (e.g. paging, scheduling), or handled via the hardware itself (cache memory hierarchy). The problem will get worse as modern CPUs and machines continue to diverge from those originally targetted by C in the 1970s.
Hopefully Zig [1] language will become a better alternative to C in upcoming years. Not talking about higher level code where Rust or Go can be a better choice.
No language can become an alternative to C in the context of UNIX like OS because no one is going to re-write them from scratch, given their symbiotic nature.
Even if the complete userspace of Aix, HP-UX, *BSD, GNU/Linux, OS X, iOS, Solaris,.... gets re-writen in something else, there will always be the kernel written in C.
Hence why improving C's lack of safety is so important to get a proper IT stack.
The problem with Zig is that they changed almost everything. I think there's a high risk they introduced new design problems that we won't know about fully until Zig has been used in anger for 10 years.
I've always felt that C is near the sweet spot. I'd rather see a minimal change to C that broke backwards compatibility (because it has to) and fixed the top ten simple problems.
I'm confused by this. The third argument provides the destination length, so what good would a "maximum destination length" do? I guess he must mean that because the length is often computed, you'd need a fourth argument to ensure the length isn't greater than some sane upper bound. But you can easily fix that using an if statement around the memcpy.