Hacker News new | ask | show | jobs
by pcwalton 3688 days ago
This is obviously a bitter rant, and devolves into uncomfortably ageist territory about halfway through.

I do agree that we should be moving away from C and C++, though. It's pretty simple, really: C was a pretty good language in 1978. We didn't know a lot of things in 1978 that we do now in 2016. It now makes sense to revisit those decisions in light of nearly 40 years of practice. The so-called "PL Renaissance" has given us a whole host of new languages which have steadily chipped away at the dominance of C and C++, and I think this is a healthy trend that ought to continue.

5 comments

I'm ready for the hate, so here we go... C was not a well-designed language in 1978.

The fact that C arrays decay to pointers without any bounds is single-handedly responsible for a huge chunk, possibly even the majority, of all RCEs, worms, malware, and exploits. Ever. In the history of computing.

It was a bad design.

It was a bad design in 1978.

It was known to be a bad design in 1978.

Other languages knew that checking array bounds was important, including for security. The internet made the impact of using C much more devastating but people were exploiting buffer overflows in the 80s to great effect. Some of C's predecessors/contemporaries passed a length as the first part of an array so bounds-checking was possible, though that has the downside of not being able to pass slices of an array without copying.

C could have included an arrayref type that was a length + base pointer, and let array l-values decay to an arrayref instead of a pointer. Then taking a slice of an array would not require copying elements. You could still take the address of an individual element. This would not have required much work to implement, even in 1978! Maybe the first compilers didn't insert array bounds checks, but at least the entire design wouldn't preclude them. Let's say you even spell arrayref as []. It would mean sizeof() works on arrays passed to functions.

void wat(int[] values) { for(int i = 0; i < sizeof(values); i++) { printf("look ma, no buffer overflows! %d", values[i]); } }

(Yes, I know this is not K&R syntax)

Maybe you can forgive C for the stupid header compilation model (why let the compiler do what you can make the programmer do by hand?). You can understand why they might not have foreseen the need for namespaces. D&R didn't invent the macro system so that's not even their fault.

What is unforgivable is the horribly stupid design of C's arrays.

I actually think it would be beneficial if the standards committee added arrayref now. It won't fix all the busted C code but at least you could start improving the #1 problem. Compilers could eventually adopt a flag to prohibit arrays from decaying directly to pointers. You'd probably have to introduce lengthof() to avoid confusion and use some other syntax to declare one, maybe array(int) or something.

I suspect this has a -lot- to do with performance.

When C was designed, and even today, there are systems without pipelining, where it is expensive (in time) to de-reference a memory address and follow that pointer.

I do not argue that the design you suggest would be safer, and even have advantages for slicing; but that's really not the kind of program that C was intended to service writing.

Also, C is supposed to scale down to //really// simple systems. Systems that lack indirect addressing modes, caches, MMUs, etc. It is literally intended to be a thin veneer over actual assembly for those systems, and why so many operations are specified in terms of /minimum standard unit size/ (for portability of that almost machine code between systems).

What you advocate is more like what C++ actually /should/ have been; a reason to use something more than C to gain advances in safety and ease of design.

>I suspect this has a -lot- to do with performance.

It's questionable whether people wanted that performance though, at least when it resulted in less security. About bounds checking in ALGOL 60: https://en.wikipedia.org/wiki/Bounds_checking

A consequence of this principle is that every occurrence of every subscript of every subscripted variable was on every occasion checked at run time against both the upper and the lower declared bounds of the array. Many years later we asked our customers whether they wished us to provide an option to switch off these checks in the interest of efficiency on production runs. Unanimously, they urged us not to—they already knew how frequently subscript errors occur on production runs where failure to detect them could be disastrous.

> It's questionable whether people wanted that performance though, at least when it resulted in less security.

There's no question about it, the "ANSI C Rationale" makes it very clear what they considered "the spirit of C"[1]:

> - Trust the programmer.

> - Don't prevent the programmer from doing what needs to be done.

> - Keep the language small and simple.

> - Provide only one way to do an operation.

> - Make it fast, even if it is not guaranteed to be portable.

> The last proverb needs a little explanation. The potential for efficient code generation is one of the most important strengths of C. To help ensure that no code explosion occurs for what appears to be a very simple operation, many operations are defined to be how the target machine's hardware does it rather than by a general abstract rule. An example of this willingness to live with what the machine does can be seen in the rules that govern the widening of char objects for use in expressions: whether the values of char objects widen to signed or unsigned quantities typically depends on which byte operation is more efficient on the target machine.

> One of the goals of the Committee was to avoid interfering with the ability of translators to generate compact, efficient code. In several cases the Committee has introduced features to improve the possible efficiency of the generated code; for instance, floating point operations may be performed in single-precision if both operands are float rather than double.

[1] http://www.lysator.liu.se/c/rat/title.html Quoted section is found here: http://www.lysator.liu.se/c/rat/a.html#1

"The block structure of ALGOL 60 induced a stack allocation discipline. It had limited dynamic arrays, but no general heap allocation. The substantially redesigned ALGOL 68 had both heap and stack allocation. It also had something like the modern pointer type, and required garbage collection for the heap. The new language was complex and difficult to implement, and it was never as successful as its predecessor."

-- http://www.memorymanagement.org/mmref/lang.html

Adding runtime bounds checking of automatic storage arrays (i.e. arrays on the stack) is relatively easy in C, at least until the compiler runs into illegal type punning. The real problem in implementing these compiler safeguards comes with crossing translation units, or with heap blocks. There's a reason languages like Rust and Go rely heavily on static linking and stack allocation; it's more difficult or more costly to implement those safeguards when the compiler can't see all the source code, or pointers pass through an opaque layer. Nothing in C precludes automatic bounds checking of all array access, via fat pointers or lookup tables. Fabrice Bellard's Tiny C compiler implemented precise bounds checking for both automatic and dynamic storage-allocated objects a decade before UBSan and ASan. Even deriving an invalid pointer crashed the app at the precise point where it happened. That widely-used C compilers don't do that is a strong hint there are other, real-world constraints in place.

Also, in language like Java it's not uncommon to see people reinventing dynamic heap allocation using char arrays, susceptible to all the same overflow problems. When you see people doing that, that should be a hint that a language like C might work well.

I don't understand all the C hate. Then again, I have no problem employing various languages according to the task, or creating DSLs. I suppose if I was wedded to a single language or to the idea of a single language, C would look much worse to me.

> There's a reason languages like Rust and Go rely heavily on static linking and stack allocation

This is untrue: Rust certainly does not do any optimisations linking statically by default, nor is there a difference between putting an array on the stack or on the heap. While it is true that code can benefit from whole-program optimisation, it isn't the default in either language, just like it isn't the default in C.

Languages which bake in automatic bounds checking at every access rely on optimization to recover the performance hit. Without static linking, automatic GC, and other constructs that's very difficult.

LTO notwithstanding, once you add those more sophisticated constructs, iterating the language becomes more difficult. You don't hit upon the best method for implementing various types the first time, or the second time, or even the third time. glibc is backwards compatible for programs compiled over 15 years ago (GCC's fixinclude hacks notwithstanding). You'll never see that with Rust's or Go's standard library, just like you never saw that with C++.

My point wasn't that static linking was necessary. My point was that static linking is indicative of other tradeoffs that most people don't understand. Static linking isn't just about making packaging easier. It's also about making it easier to write and implement the compiler and standard environment.

My more abstract point is that people who think C is on its last legs don't understand the whole picture. There's nothing intrinsic to C that makes it unsafe. Febrice's compiler was perfectly capable of implementing the C standard to the letter. What makes C unsafe are the requirements found in the niches where C exists, and those requirements don't magically disappear because the name of the language changes.

Rust supports unsafe code, but implementing code in Rust which is rigorously robust in the face of OOM situations, or where you need to implement use-case memory management strategies requires relying almost exclusively on unsafe code. (Try using Rust without boxing, for example, as is necessary if you want to catch OOM.) If you don't need those things, you probably don't need a low-level language, either. I love C, but I also love language like Lua with lexical closures and stackless coroutines. To me, languages like Rust and even C++ exist at a middle ground that is very unappealing to me.

C isn't standing still, either. Strategies like SafeStack (see http://dslab.epfl.ch/proj/cpi/) can provide substantially the same safety guarantees as Rust in terms of real-world attack vectors, without having to modify any existing C software, and without giving up performance.

None of this is to say languages like Rust are useless. Just that the harms and inevitable demise of C per se are, IMHO, greatly exaggerated. And if and when a language like Rust grows in usage, I doubt it will supplant C so much as open and populate virgin territory.

The lack of bounds checking is one of the biggest problems in C, but there are worse problems (use after free) that nobody has even thought of a solution for.

> That widely-used C compilers don't do that is a strong hint there are other, real-world constraints in place.

Yes. Those constraints are self-inflicted wounds caused by the fact that C wasn't designed for this. If you have a proper iterator API, a culture of unsigned array indexing, widespread use of a size_t equivalent instead of int for loops, etc. etc. these issues vanish.

> Also, C is supposed to scale down to //really// simple systems. Systems that lack indirect addressing modes, caches, MMUs, etc. It is literally intended to be a thin veneer over actual assembly for those systems, and why so many operations are specified in terms of /minimum standard unit size/ (for portability of that almost machine code between systems).

This is C snake oil, sold by the C community that usually omits the fact that computers build a decade before the PDP-11 like the Burroughs, already had much better systems programming languages like ESPOL and NEWP, two Algol derivatives.

Algol 68 already had slices and modules (Algol68 RS) among many other nice features, running in early 70's hardware.

There are a few other examples.

Maybe you can forgive C for the stupid header compilation model (why let the compiler do what you can make the programmer do by hand?).

This model enables binary-only distribution of libraries, you get the code as a .a (or lib, .so, .dll or whatever) and the API declaration as a header file.

You can write code against a library without having the library, using only the header. You can't do the final linking of course, but you can write the code.

The alternative, I guess, would be to embed this information in the library itself, and have the compiler extract it, which sounds as if it would have been scary from a performance point of view 40 years ago (and also somewhat hard).

Why do you think that parsing a compiler-generated binary representation of an API is more expensive than parsing a human-readable textual representation of the same API? Also, have you ever built a large template heavy C++ program?
When I've mentioned what you propose to other programmer I always get baleful looks and a statement that, 'that's not how C is supposed to work'. Mention bounds checking, same thing.

I used to think it was hopeless, especially as each new language that came out required garbage collection or worse targeted the JVM. Perhaps cloud services will motivate people to fix this stuff since now computing costs are a hard line item on the books.

It sure wasn't a bad design for Unix Implementation Language (tm)
> C was a pretty good language in 1978. We didn't know a lot of things in 1978 that we do now in 2016. It now makes sense to revisit those decisions in light of nearly 40 years of practice.

We surely did know that Burroughs was selling an operating system written in ESPOL, later NEWP in 1961. Nowadays Unisys still sells them as MCP.

We did know that the Flex machine was written in ALGOL 68RS in 1980.

We did know that VME was written in S3 in 1970.

We did know that Pilot was written in Mesa in 1977.

We did know that Lillith was written in Modula-2 in 1997.

There are lots of other examples.

The main difference was that UNIX and consequently C, source code were available for free because AT&T could not sell it, while people had to pay for the other ones or they were behind research walls.

Correction to myself, Lillith was written in Modula-2 in 1977.
It's remarkable, though, that it's taken 40 years to make much head way in replacing C.

There's still a lot of C code out there, and a lot of new C code still being written.

> It's remarkable, though, that it's taken 40 years to make much head way in replacing C.

I don't agree: it's been a long process, but the trend is unmistakable. It's hard to remember now, but in the early '90s C and C++ were completely dominant. Nowadays they're much more specialized: you're as likely to build your company on Java or even Python/Ruby as you are to build it on C++. People talk about how it's hard to hire C++ engineers nowadays, while in the '90s "C++ engineer" was pretty much synonymous with "programmer". And so on.

That's quite normal, because the market today is much larger and diverse. e.g: It doesn't make any sense to build web apps in C or C++ and this type of software is very spread nowadays but was virtually non-existing back then.

The interesting question is what will mobile devices, the IoT and embedded devices in general be programmed in? C and C++ are popular choices today, so the trend is not really "unmistakable".

Yes, but the majority of those devices, at least the ones with enough KB, use C or C++ for hardware integration or some interpreter, with the rest of the stack in something else.

Everything else usually has other languages available as well, one just needs to search for what is out there.

> Nowadays they're much more specialized: you're as likely to build your company on Java or even Python/Ruby as you are to build it on C++.

Some of today's dominant platforms are developed mostly in Java (see Android), and web development targets the LAMP stack. This means that the business is centered in ventures that exclude most languages, not because there is technical merit on other alternatives.

I'm sure it's possible to gather some individuals that are more than willing to badmouth Java and Python with a passion with the same ease we see here people complaining about C.

Guess which language was used to write your OS, browser, games, compilers, runtimes and so on. If you count the number of hours that people spend on software written with C/C++, I think all other languages would be left in dust.
Lots more will be written until there is a viable replacement for C in embedded systems.
A few alternatives possible to buy today:

Oberon for ARM Cortex-M4, Cortex-M3 Microcontrollers and Xilinx FPGA Systems

http://www.astrobe.com/default.htm

Pascal and Basic for lots of micro and pico-processors

http://www.mikroe.com/compilers/

Ada,

http://www.ghs.com/products/ada_optimizing_compilers.html

http://www.ptc.com/developer-tools/apexada

http://www.adacore.com/

Java for MCUs

http://www.microej.com/stmicroelectronics/

Exactly. It's beyond my authority to change the language in use, but I would love to have alternatives to argue for. C isn't a terrible option, or is perhaps the least terrible option, but it's not leaving my corner of the corporate universe until a proven alternative establishes itself.
What in particular makes Rust (or Go or Swift) unsuitable?
GC languages are almost certainly not even in consideration for most embedded applications. There aren't good strategies for general garbage collection that don't insert random pauses for starters, and you also have a non-negligible impact on RAM usage on systems where RAM might be a premium.

A lot of the things rust brings to the table aren't always relevant on embedded platforms. Dynamic memory allocation on embedded is the exception, not the rule. Everything is statically allocated, so memory management is relatively simple -- everything sticks around forever.

The things that make C/C++ good for embedded are sorta what make it unfortunate for general purpose use. The things that make Rust/Go/Swift good for general purpose use make it unfortunate for embedded use.

> A lot of the things rust brings to the table aren't always relevant on embedded platforms. Dynamic memory allocation on embedded is the exception, not the rule. Everything is statically allocated, so memory management is relatively simple -- everything sticks around forever.

In that case, you can simply not use the dynamic allocation features of Rust, just as you can simply not use malloc() in C.

  > GC languages
Rust is not a GC'd language, it essentially uses RAII to deterministically determine at compilation time when memory will be freed.

  > Dynamic memory allocation on embedded is the exception
Rust fully supports running entirely without dynamic allocation. There is a subset of the standard library defined explicitly for this purpose.
Doesn't the design of Go, the language, basically require that the implementation involve a runtime with a GC system? So that would make it a non-viable choice for programming in applications where memory footprint and real-time performance must be tightly controlled.

Rust is a better choice, and it's designed with this in mind. It's still young, though, and I think there might be some as-yet-unsolved issues (these are things I've vaguely heard of and could be totally off-base) like binary size, ease of dealing with raw pointers, etc.

If I was doing this sort of programming for a personal project, I'd probably try using Rust, because I like it.

Dunno about Swift, though IIRC the current reference implementation may also currently rely on GC.

  > like binary size, ease of dealing with raw pointers, etc
The majority in the size of typical Rust binaries is the huge amount of space (400 kb or so) that it takes to statically link jemalloc. But if you're building for a device that doesn't support dynamic allocation then you're not going to be including jemalloc, so binary size shouldn't be a problem.

As for raw pointers, they're exactly as capable as raw pointers in C, though they're deliberately more verbose as well, because even in embedded contexts one should be favoring references over raw pointers, since references are still fully checked for safety even in embedded mode and yet are represented by raw pointers at runtime and hence have zero runtime overhead.

Rusts binary size issues are just a matter of defaults, and they're an additive bloat, not multiplicative (If the corresponding rust binary for a 5kb C binary is 200kb, then the corresponding rust binary for a 100kb C binary is 305kb). Most people don't care about a few extra kilobytes in their binary, so Rust has chosen defaults that make some things easier but also add some extra binary size. You can turn these off and get tiny binaries if you wish without much effort.
Ironic, but the one think they are missing is an easy convenient way to call C libraries.

Rust is very promising language, but unless you plan to write everything from scratch, you have to depend on 3rd party driver implementations for most stuff. Databases for example. There isn't a single database vendor that has Rust drivers.

Calling C from Rust is very convenient. All you have to do is declare the structs and function signatures and then it's like calling any other unsafe function.
Go is unsuitable as a replacement for C because it is garbage collected. End of discussion.

Rust is unsuitable as a replacement for C because its memory management is poorly thought out (ie. its a joke). Here's the relevant paragraphs from the Rust FAQ. Really?

"Rust avoids the need for GC through its system of ownership and borrowing, but that same system helps with a host of other problems, including resource management in general and concurrency.

For when single ownership does not suffice, Rust programs rely on the standard reference-counting smart pointer type, Rc, and its thread-safe counterpart, Arc, instead of GC.

We are however investigating optional garbage collection as a future extension. The goal is to enable smooth integration with garbage-collected runtimes, such as those offered by the Spidermonkey and V8 JavaScript engines. Finally, some people have investigated implementing pure Rust garbage collectors without compiler support."

> For when single ownership does not suffice, Rust programs rely on the standard reference-counting smart pointer type, Rc, and its thread-safe counterpart, Arc, instead of GC.

This is a library thing, not a language thing.

If single ownership is enough for you, go ahead and use it. But if you need a different memory management strategy, that is available too.

Rust, the language, provides a single clear memory management strategy. It also provides the ability to design your own abstractions for different strategies, and implements some of these in the stdlib.

C/C++ have refcounting and GC libraries too. Does that make them a joke?

You don't describe why you think it's a joke.
This post is a joke without a material argument to support your claim.
> C was a pretty good language in 1978. We didn't know a lot of things in 1978 that we do now in 2016.

We also have nearly 40 years of infrastructure built on C, which needs to be maintained and updated.

This is the same old argument advocating for rewriting everything from scratch just because someone somewhere managed to develop a new flavor of the month.

There are plenty of reasons why the whole world still has a heavy demand for COBOL and FORTRAN developers, and the development of new flavor of the month isn't a good enough reason to eliminate this demand.

> This is the same old argument advocating for rewriting everything from scratch just because someone somewhere managed to develop a new flavor of the month.

I'm not saying rewrite everything for no reason. I'm saying that there are reasons, and we've gotten a very good idea of what those are over the last 40 years.

C can be replaced with a reasonable amount of effort with C++.

But C++ won't be easy to replace, and I'm not sure it needs to be, since rewrites are highly risky, time consuming and disruptive. With some luck and depending on how the language evolves we might be moving from C++ to a safer C++.

I agree, and that was one of my motivations to adopt C++ instead of C when Turbo Pascal wasn't any longer an option.

But using the C++ features that make it safer than C is only an option in small security motivated teams.

Sadly the majority of C++ teams, at least in the enterprise space, tends to use it as "C with classes" thus voiding most improvements the language has to offer over plain C.

You've hit on an important point - culture. Every programming community has it and it can enhance or hinder the adoption and usability of a language.

C++ is split between multiple factions. I'm doubtful that the one programming in C with classes is interested in learning e.g Rust.