C considered dangerous | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	C considered dangerous (lwn.net)
	45 points by johnramsden 2853 days ago

8 comments

rwmj 2853 days ago

> He asked: why is there no argument to memcpy() to specify the maximum destination length?

I'm confused by this. The third argument provides the destination length, so what good would a "maximum destination length" do? I guess he must mean that because the length is often computed, you'd need a fourth argument to ensure the length isn't greater than some sane upper bound. But you can easily fix that using an if statement around the memcpy.

vardump 2853 days ago

Perhaps because the memory buffers might be of different size.

Maybe memcpy_oobp (out of bounds protection) signature could be:

  memcpy_oobp(void* dst, size_t dst_size, void* src, size_t src_size);

Then again, I guess you could just as well do:

  memcpy(dst, src, min(dst_size, src_size));

But having to explicitly specify both destination and source sizes might have prevented a lot of buffer overwrite bugs.

sebcat 2853 days ago

> But having to explicitly specify both destination and source sizes might prevented a lot of buffer overwrite bugs.

A good way to prevent this is to have a buffer abstraction, where the size is a property of the type, e.g.,

    typedef struct {
      size_t bytes_used;
      size_t capacity;
      void *data;
    } buf_t;

    int buf_init(buf_t *buf);
    void buf_cleanup(buf_t *buf);
    void buf_copy(buf_t *dst, buf_t *src);
    /* ... */

Of course, it doesn't prevent people from using memcpy directly.

rwmj 2853 days ago

I guess so. One of the LWN comments mentions a Microsoft function memcpy_s defined as:

    memcpy_s (void *dest, size_t destSize, const void *src, size_t count);

which is effectively equivalent to your memcpy_oobp function.

However the Microsoft function also returns an error code which must be checked (because count might be larger than destSize), thus providing another way for the programmer to screw up. I'm not sure if this is better or worse than just copying the min() as in your second example. It probably depends on the situation.

lomnakkus 2853 days ago

Using min() seems like it could be incredibly dangerous as an "implicit" behavior, not to mention surprising.

I'd wager it'd be much better to just specify that abort() gets called in the "overflow" case. (Given that overflow is basically never what you want anyway.)

Yeah, it'll crash but at least it won't be suprising/undefined behavior.

rwmj 2853 days ago

For extra fun, the Microsoft implementation of memcpy_s returns an error instead of crashing if either of the pointers is NULL (thankfully doesn't apply if the copy size is 0). There's a reason I don't like writing software for Windows ...

rurban 2850 days ago

Just use memcpy_s. This has the destbuf size argument. It's even in C11, but you need the safeclib or MSVC, as no libc cares about the safety annex.

deng 2853 days ago

Thankfully, compiler warnings and static analyzers have become much better in recent years. For instance, gcc can now warn about a missing 'break;' mentioned in the article (you need to add a special comment like '/* fall through */' if it's intentional). Also, clang-tidy is getting better with each release. I highly recommend using it, although the initial configuration will take some time, depending on the code base.

xroche 2853 days ago

Alas! strlcpy and strlcat are still not present in the glibc, despite numerous attempts, mainly for religious reasons (ie. "BSD sucks").

And yes, having something like "if (strlcat(buffer, src, sizeof(buffer) >= sizeof(buffer)) { abort(); } " is much better than buffer overrun. But security does not always seem to be a real concern, compared to politics.

yason 2853 days ago

C is dangerous partly because assembly language is dangerous. We will always need some layer on top of assembly that is mostly unchecked and reflects back to how cpu instructions work. This is probably something we must live with until we have processors with the notion of type checking.

C is dangerous partly because of swaths of undefined behaviour and loose typing. Eliminating much of undefined behaviour either by defining the behaviour or forcing the compiler to refuse compile undefined behaviour could be of some help. There are still classes of undefined behaviour that cannot be worked around but narrowing that down to a minimal set would make it easier to deal with it. Strong typing would help build programs that won't compile unless they are correct at least in terms of types of values.

C is dangerous partly because of the stupid standard library which isn't necessarily a core language problem as other libraries can be used. The standard library should be replaced with any of the sane libraries that different projects have written for themselves to avoid using libc. It's perfectly possible not to have memcpy() or strcpy() like minefields or strtok() or strtol() which introduce the nice invisible access to internal static storage, fixed by a re-entrant variant like strtok_r(), or require you to do multiple checks to determine how the function actually failed. The problem here is that if there are X standards, adding one to replace them all will make it X+1 standards.

Yet, good programmers already avoid 99% of the problems by manually policing themselves. For them, C is simple, productive, and manageable in a lot more cases and domains than it is for the less experienced programmers.

pjmlp 2853 days ago

Ironically other systems programming languages developed outside AT&T walls since 1961 did not suffer from the majority of C's pain points regarding memory corruption.

I really wish Bell Labs had been allowed to sell UNIX.

IshKebab 2853 days ago

Terrible title. It's not remotely news that C is dangerous. This talk seems to be about ways of mitigating the dangers. Why not call it "Mitigating the dangers of C" or something else that is less of a tired cliche?

pjmlp 2853 days ago

Because "Making C Less Dangerous" is the actual title of the talk, and "Towards less dangerous C" is part of the agenda?

fithisux 2853 days ago

The title is completely misleading.

vardump 2853 days ago

I write a ton of C and I completely agree with the title. With 20+ years of experience.

Kernel drivers and embedded system bare metal firmware.

The problem with C is that in any bigger project something always slips through even the best programmers, reviewers, static analysis and unit tests. And that something can lead to disastrous crashes and security vulnerabilities.

FraKtus 2853 days ago

Don't you think that with the tools we have now it's easier to control the quality of code produced (Clang memory sanitizers and so on)? I feel more at ease to ship C code today after instrumenting it than a few years ago...

vardump 2853 days ago

Tooling absolutely helps to reduce defects. That's why you use them.

That said, sometimes I'm shocked what kind of disasters get past the analyzers.

Stakes are higher than ever. It's not just about functional correctness and avoiding crashes anymore. Your code needs to be secure against outside world malicious actions. Getting rid of counterintuitive security vulnerabilities is very, very hard.

pjmlp 2853 days ago

I would say that is why security conscious developers use them.

Sadly we are a very very tiny percentage, as proven by Herb Sutter question to the audience at CppCon (1% of the audience answered positively), and CVE frequent updates.

pjmlp 2853 days ago

Not really, as it is proven almost on daily basis.

https://cve.mitre.org/cgi-bin/cvekey.cgi?keyword=memory+corr...

FraKtus 2853 days ago

How do you know that developers working on those used tools such as the Clang Memory Sanitizer?

pjmlp 2853 days ago

Because many on that list are well known FOSS projects that supposedly have such processes in place, including manual review before accepting patches into mainline, like the Linux kernel being discussed here.

raxxorrax 2853 days ago

For embedded systems I mostly go with "dynamic memory allocation of any kind is evil" and that solves a lot of issues already.

You can still overwrite memory but it suddenly became much less likely.

vardump 2853 days ago

> For embedded systems I mostly go with "dynamic memory allocation of any kind is evil" and that solves a lot of issues already.

Yeah, bare metal systems often don't allocate at all. Although one sin they often do commit is using same buffer for multiple purposes. What could go wrong...

Perhaps even more common is allocating a buffer on stack and writing past bounds somehow. Also DMA to/from stack is usually not a great idea...

Above things sound dumb, but can easily happen when you build your abstraction layers and use them carelessly.

MrBuddyCasino 2853 days ago

> DMA to/from stack

wait what oh my god

maccard 2853 days ago

That only eliminates a certain case of bugs. There are still plenty of foot-shotguns available - memcpy/memset, strlen, gets/puts, printf, any file IO, networking calls, etc.

AlotOfReading 2853 days ago

This is my view as well, from the same industry. However, the quality of the tools available in C to deal with its issues far exceed those in any other language. I would love to drop C from all my systems, but the alternatives simply aren't there.

pjmlp 2853 days ago

The alternatives were there before UNIX took over server room and workstation market.

Just imagine how many millions the IT industry and PhD research have spent developing solutions that would improve C's safety, many of them largely ignored by most C developers.

abainbridge 2853 days ago

> The problem with C is that in any bigger project something always slips through even the best programmers, reviewers, static analysis and unit tests.

That's also true of all the other languages.

yaris 2853 days ago

Well, I can say the same about Python, Erlang, Lua, in addition to C and C++. I believe C is not worse than these languages, only that C requires different (sometimes very different) skills and discipline.

vardump 2853 days ago

I'm absolutely sure same skill level programmer will create less defects in Python, Erlang and Lua than in C. You really have to try to overwrite memory in those languages.

Of course you can shoot yourself into foot with stuff like metatables in Lua and Python metaclasses and whatnot. Then again you should see some C macro messes around...

Anyways I don't like when people defend C with that age old argument it requires a clever disciplined programmer that never makes mistakes. Because either such programmers don't exist or they're very rare.

notacoward 2853 days ago

> I'm absolutely sure same skill level programmer will create less defects in Python, Erlang and Lua than in C.

Fewer defects, or just different (arguably less severe) defects? It's great that you're sure, but evidence would be even better.

vardump 2852 days ago

Ok, that's a fair point. I don't have the evidence for that.

Scripting languages do have their pitfalls. Lua and python can have type mismatches and even typos causing misbehavior, things that usually aren't issues with C.

However, you do need significantly less code than in C.

pjmlp 2853 days ago

Python, Erlang, Lua = Logic Errors

C and C++ = Logic Errors + Memory Corruption + UB

From this point of view,

Σ Logic Errors < Σ (Logic Errors + Memory Corruption + UB)

jcelerier 2853 days ago

hmmm... In my experience, I have had much less logic errors in C++ than in Python or JS because I tend to try to encode the domain logic into the types as much as possible, so that I can piggyback on the compiler.

pjmlp 2853 days ago

And how many memory corruption and UB errors did your Python and JS code had?

jojoo 2853 days ago

It's probably a reference to https://en.m.wikipedia.org/wiki/Considered_harmful

aogl 2853 days ago

I agree, it's very click-baity. C is actually great and is only really dangerous because it gives the programmer so much control.

simias 2853 days ago

I'm a C coder first and foremost and I strongly disagree with this mentality (even though I know it's extremely pervasive in our circles). "Footguns don't make bugs, coders do" is technically true but if we could keep the footguns at a minimum and only get them out of the locker when truly necessary instead of having them spread all over the place all the time I'm sure it wouldn't hurt.

C is a very useful language and one you basically have to know if you're interested in low level software but it's very, very far from flawless.

If you look at many high profile software vulnerabilities of late (heartbleed, goto fail, etc...) many can be traced to the lack of safety and/or bad ergonomics of the C language.

We need to grow up as an industry and accept that using a seatbelt doesn't mean that you're a bad driver. Shit happens.

pietroglyph 2853 days ago

> C is actually great and is only really dangerous because it gives the programmer so much control.

This doesn't actually refute the assertion that C is dangerous :)

Control and increased safety are not mutually exclusive. I'll take safe-by-default, unsafe-when-asked any day. It's not 1972 anymore.

buboard 2853 days ago

"Programmers using C are considered dangerous"

willtim 2853 days ago

C actually gives one rather limited control over modern hardware with it's memory hierarchies and superscaler CPUs. Programming language research has also moved on a lot since the 70's, which is why we should be considering less dangerous languages (e.g. better type systems and less undefined behaviour). Languages like ATS and Rust also support explicit memory management, whilst being a whole lot safer.

vardump 2853 days ago

C alone doesn't provide the control directly, but you as a programmer can absolutely leverage C to take control of the memory hierarchies by controlling your data access patterns. IOW, high locality of reference.

Good C-compilers will most of the time take care of the superscalar CPU friendliness. When they don't, you can always drop down to the assembler level, and it'll mesh well with C.

willtim 2853 days ago

High-locality of reference can be achieved in any language that supports unboxed types, it doesn't require C (even a very high-level language like Haskell has support for this). But this is a long way from having complete control how each memory heirarchy is used.

Likewise most static languages defer to the compiler for CPU-specific performance optimisations and will permit foreign native calls into C or ASM where necessary. So I don't see how this is an argument in C's favour.

vardump 2853 days ago

> High-locality of reference can be achieved in any language that supports unboxed types, it doesn't require C (even a very high-level language like Haskell supports this).

You often also need correct alignment. Cache-line or page. Your unboxed access across two pages can cause two TLB misses, L1 misses etc. Not to mention two page faults.

Sometimes you need to ensure two (or more) buffers are NOT aligned in a particular way to avoid interfering with CPU caching mechanisms.

pjmlp 2853 days ago

Even in the 70's there was NEWP, PL/I, PL/S, PL/8, Concurrent Pascal, Mesa, BLISS, Modula-2, ....

C wins them all in implicit conversions and opportunities for memory corruption.

Their major sin was to be tied to commercial OSes, instead of one with source code available for a symbolic price to universities.

millstone 2853 days ago

Are you suggesting that other languages provide more control over modern hardware?

willtim 2853 days ago

Yes. Currently access to modern hardware features are either via cumbersome APIs (e.g. NUMA, AVX intrinsics), handled via the OS (e.g. paging, scheduling), or handled via the hardware itself (cache memory hierarchy). The problem will get worse as modern CPUs and machines continue to diverge from those originally targetted by C in the 1970s.

xvilka 2853 days ago

Hopefully Zig [1] language will become a better alternative to C in upcoming years. Not talking about higher level code where Rust or Go can be a better choice.

[1] https://ziglang.org/

pjmlp 2853 days ago

No language can become an alternative to C in the context of UNIX like OS because no one is going to re-write them from scratch, given their symbiotic nature.

Even if the complete userspace of Aix, HP-UX, *BSD, GNU/Linux, OS X, iOS, Solaris,.... gets re-writen in something else, there will always be the kernel written in C.

Hence why improving C's lack of safety is so important to get a proper IT stack.

abainbridge 2853 days ago

The problem with Zig is that they changed almost everything. I think there's a high risk they introduced new design problems that we won't know about fully until Zig has been used in anger for 10 years.

I've always felt that C is near the sweet spot. I'd rather see a minimal change to C that broke backwards compatibility (because it has to) and fixed the top ten simple problems.

amelius 2853 days ago

Why don't they use valgrind?

deng 2853 days ago

The kernel has CONFIG_HAVE_DEBUG_KMEMLEAK.