Hacker News new | ask | show | jobs
by nickpsecurity 3740 days ago
"C is a systems language from the ground up"

It was actually an extension of BCPL, which wasn't designed: just what parts of a good language compiled on 1960's hardware. Proof below.

http://pastebin.com/UAQaWuWG

"with decades of successful use"

It actually had decades of failures with all sorts of bugs and hacks that safer, system languages dodged by design. Only the best coders got successful and secure use out of it. We praise OpenBSD quality for a reason: it's not easy.

"Rust is the new kid on the block with everything to prove. "

This is true. I have a rule against using anything new for security-critical coding if its in the TCB. Takes time to discover all the issues in things.

"With only two holes in the default install in over a decade"

Propaganda I've called out plenty. On the other systems, people finding bugs often weaponize them, declare a vulnerbaility, and add that to the count. OpenBSD treats bugs as just bugs then fixes them while assuming their mitigations stopped any attack attempts. It's easy to say you only had 2 vulnerabilities when you're not counting vulnerabilities. ;)

"C has proven its worth with billions of lines of code, something Rust will likely never achieve as a niche language."

It does have proven worth. After billions of lines, you can be sure you'll be fixing all sorts of things and doing breach notifications if you rely on it. Unless you pay extra money for top coders. Rust already beat it on app-level safety w/ effects of low-level interactions and compiler risk being next to assess or address. Ada and SPARK beat both for systematic safety with many empirical results from case studies and field use. Safe versions of C like Cyclone and Popcorn outdid C, too, in security but nobody invested more in them. TAL and CoqASM are even doing safety/security at assembler level.

And so we have a language proven worthless for quality or security the mainstay of quality or security focused UNIXen even with decades of alternatives empirically shown to be better. Sounds like a cultural thing to me. Drawback too.

Only advantages: lots of people know it and lots of existing code/tooling. Valid reasons to choose it for existing BSD code but allows it was inferior on other angles. And that rewrites to safer languages for it or new projects should be ongoing.

1 comments

I am genuinely wondering, what is (or could you point me to) the alternative with the following properties:

- Compiled, type-safe and available for armv6.

- Simple semantics: Rust and Ada are complex (C++-ish) and it gets hard to limit the number of memory allocations/accesses as well as data copies going on.

- Tooling and discoverability: Man pages and Emacs with a few modes that are easy to setup beats anything I've tried so far.

I understand that C has shortcomings when it comes to safety/security and even lacks features that would make programming certain things easier, but what do you suggest I use when I want to write a UNIX daemon that needs to transfer a boatload of data from disk over the network and vice-versa?

I personally like it, I find it to be clear and concise, a little tedious but at the price of giving me fine grained control over the data in memory: I just have to be careful with that.

  > it gets hard to limit the number of memory allocations/accesses
  > as well as data copies going on.
I would be interested in hearing more about this, if you have the time.
In this video[0] on Rust, in this example I was unable to directly tell whether the Vec structure is being copied to the take() function. If ownership is being handed over, why is a copy being made? And if a copy really isn't being made, then why aren't the function signature and the calling site reflecting that?

I hope I am making any sense... Please tell me if I should try to rephrase, or if I am in need of clarification on the subject.

(BTW, this is also something I really dislike about C++, two function calls that look exactly the same, foo(x) and bar(x), could either be pass by value or pass by reference: you have to go dig up the function signatures to figure that out).

[0] https://youtu.be/O5vzLKg7y-k?t=1110

Gotcha. I thought the video would have explained this, but since it didn't get through, let me try :)

  fn take(vec: Vec<i32>)

In Rust, the default way that things operate is to _move_. So when you call take(), we'd say that the vector moves into the function.

Moving is always a memcpy. There's no way to change this behavior, so you can always know that it's true. But with a Vec, there's a subtlety: the Vec itself is three words: a pointer to the heap, a length, and a capacity. So when I say that the Vec is memcpy'd, I mean literally those three things. It doesn't try to follow the pointer and copy the data that it points to.

Incidentally, that pointer is why we say that it moves: because two copies of the Vec structure itself would mean an aliased pointer to the heap, the compiler doesn't allow the use of the older binding after take() is called.

For simpler types, like integers:

    fn take(i: i32)
they implement a trait called Copy. This means that when you call take, the i32 is copied, in the same fashion that the Vec was copied. The only difference is that there's no problem with having these two copies, as an i32 is, well, just an i32. So the compiler won't prevent the use of the binding outside the function like it would with a non-Copy type.

References are like pointers, but with the extra static checking that Rust does.

    fn take(vec: &Vec<i32>)
References also implement Copy, and so when you pass a reference to a Vec to this function, the same thing happens as in the i32 case. The reference itself gets copied.

This is sort of a long-winded way of saying that Rust is "pass by value", not "pass by reference." Some people like to call this "pass reference by value", since while it's always pass by value, references are themselves values.

What if we _did_ want to make a full copy of the Vec, including its elements? For that, we have to use the .clone() method.

    let v = ... // some Vec<i32>
    let v1 = v.clone();
Now, v and v1 will be full, independent copies of the same thing. In other words, if you don't see .clone(), it will never be a deep copy.

    let v = ... // some Vec<i32>
    let v1 = v; // a move, always a shallow memcpy
    let v2 = v1.clone(); // a deep copy
You can always see, syntactically, when you're making a possibly expensive copy of something.

Does that help? I'm happy to elaborate further.

It was a great detailed explanation that also illustrates why I've held off on Round 2 of reviewing the docs. I mean, I read this...

"This is sort of a long-winded way of saying that Rust is "pass by value", not "pass by reference." Some people like to call this "pass reference by value", since while it's always pass by value, references are themselves values."

...and instantly have to focus hard to make sure I'm not slipping on the basic concepts. The wording of many Rust descriptions seems unnecessarily confusing to the point that I just googled value vs reference semantics to make sure my memory wasn't screwed up [more]. In a normal 3GL, pass by reference basically stores a pointer in a variable and passes that value somewhere. That value/reference can be used to modify original data outside its original function. Is that what Rust does? If so, it's just pass by reference "with (caveats/rules here)." End of story. Otherwise, I'll see if I can guess a wording that doesn't merge opposite concepts.

This concept isn't Rust specific, it's a general PLT thing. Rust is the same as C in this regard. Almost every language is pass by value these days. IIRC Fortran is pass by reference, an exception.
Yes! That definitely reinforces my understanding. So Rust is more like C and less like C++ in that it is pass-by-value everywhere, with the exception that references are not pointers but _real_ references.

Does this make things like "const int * const x" (in C syntax) pointless in Rust?

Does that mean that in Rust I can only pass to a function an immutable reference to a mutable object, but not a mutable reference to an immutable object?

Also, out of curiosity, how would something like a recv() call into the middle of an array (buffer) look like in rust? Something like "recv(sock_fd, buf + 5 * sizeof(char), buf_len - 5 * sizeof(char), 0)"?

Great :)

  > Does this make things like "const int * const x"
  > (in C syntax) pointless in Rust?
Checking myself with cdecl, because I _always_ get this wrong:

  > declare x as const pointer to const int
This is

  let x: &i32;
(using i32 because it's the default int type, even though it's different than C's int.)

The variations are:

  let x: &i32; // an immutable binding to an immutable reference
  let mut x: &i32; // a mutable binding to an immutable reference
  let x: &mut i32; // an immutable binding to a mutable reference
  let mut x: &mut i32; // a mutable binding to a mutable reference
More mutablity == longer declaration, roughly.

  > Does that mean that in Rust I can only pass to a function an immutable reference
  > to a mutable object,but not a mutable reference to an immutable object?
You're right in both. No problem treating something mutable as immutable, but treat something immutable as mutable and you get a compiler error.

  > Also, out of curiosity,
Rust has a concept called "slices". These are "fat pointers", that have both a pointer and a length inside:

  let x = vec![1, 2, 3, 4, 5];
  let slice = &x[1..3];
Here, 'slice' will be a pair, (ptr, len), so the ptr will point to the interior of the vector, and the length will be 2. If we printed it, we'd see "2, 3".

So let's check out the signature of recv:

  ssize_t recv(int sockfd, void *buf, size_t len, int flags);
This pointer, length pair looks suspiciously like buf and len here. That's for good reason. A first step towards a more Rust-like wrapper over recv would look like this:

  fn recv(socket: libc::c_int, buf: &mut [u8], flags: libc::c_int) -> libc::ssize_t {
    let ptr = buf.as_mut_ptr() as *mut c_void;
    let len = buf.len() as libc::size_t;

    unsafe { recv(socket, ptr, len, flags) }
  }
which would end up being called like this:

  recv(sock_fd, slice, 0);
combining the names from your example and mine from the slice above, heh.

The next step would be to turn `flags` into an enum, and take that as an argument rather than a C int. Then you'd want to convert the return type to a Rust integer rather than a C one... eventually, you end up with the API we have in the standard library, which looks like this for a udp socket, for example:

  use std::net::UdpSocket;

  {
      let mut socket = try!(UdpSocket::bind("127.0.0.1:34254"));

      // read ten bytes from the socket
      let mut buf = [0; 10];
      let (amt, src) = try!(socket.recv_from(&mut buf));

  } // the socket is closed here
A few comments on this:

We only said &mut buf, but I showed you syntax with [] above. When you pass a reference to an array or vector, and the function is expecting a slice, it will automatically convert to a slice of the full length. To be more accurate to your original question:

      let (amt, src) = try!(socket.recv_from(&mut buf[1, 5]));
or whatever middle part of the buffer you want.

amt is the amount of bytes read, and the src is the address, in this particular API.

You'll notice this particular API doesn't expose flags: we try to Do The Right Thing in the standard library, so you don't worry about these. Here's the source of rec_from: https://github.com/rust-lang/rust/blob/master/src/libstd/sys... c::recvfrom is libc::recvfrom, rather than recv, technically. But as an example from a different API, when opening files, we set O_CLOEXEC where appropriate: https://github.com/rust-lang/rust/pull/27971

If you want the more exotic options, then you have to dig in and make the calls yourself. Such is the pain of a standard library, trying to make the common, good case good, but we still let you dig in and build a different abstraction if you don't like ours.

IIRC moving is a possibly-optimized memcpy, LLVM might remove it entirely.
Yes, I'm speaking purely semantically.
That statement didn't fully sink in first time I saw it. I'm also curious as to what that was referring to. Especially since Ada/SPARK are mainly used in real-time, resource-constrained systems.
You might like Nim or Crystal?
I left Nim off on purpose because it wasn't simple or C-like at all. It's an interesting language that could be a C++ or C replacement, though.
The problem is, outside Ada and Rust, there isn't much of anything that's maintained because people stayed rejecting anything but C. I will post this on Ada so you can see (a) a nice survey of problems that show up and (b) how it systematically counters them.

http://www.adacore.com/uploads/technical-papers/SafeSecureAd...

The best candidates for simpler ones were Wirth-like languages, esp Modula-3. I used to recommend Delphi as it was a Pascal alternative to Visual C++ whose apps rarely crashed. Free Pascal succeed w/ Lazaurus IDE succeeded it w/ tons of hardware support. Component Pascal w/ Blackbox is still active AFAIK. D is quite active.

Some Modula-2 benefits https://news.ycombinator.com/item?id=9640126

Modula-3 features https://en.wikipedia.org/wiki/Modula-3

Note: Fast to compile, fast to run, easy to read, easy to integrate, optional GC, optional OOP... why we need C and C++ again? Outside legacy systems...

Free Pascal http://www.freepascal.org/

Component Pascal https://en.wikipedia.org/wiki/Component_Pascal

D language (C/C++ successor) https://en.wikipedia.org/wiki/D_%28programming_language%29

Julia http://julialang.org/

Note: It's a language for scientific programming but it's worth considering given speed and C support.

On functional side, people are writing OS's in Haskell, Ocaml, and so on. OcaPic put Ocaml on 8-bitters. ATS Language was used for drivers and 8-bitters. RED/System is like LISP w/out parenthesis for system programming with ability to make DSL's. Any such language can have safety checks built in or output something for analysis. So, even functional languages are performing acceptably in places where C used to be required. Just need more people investing into any trouble spots.

Someone could also pick up the code of Popcorn, Cyclone, or another safer C to develop it. Cyclone is worth linking to as it was so clever:

https://en.wikipedia.org/wiki/Cyclone_%28programming_languag...

Just gotta maintain the front-end. Ivory language from Galois is still maintained & extracts to C. Tools like Softbound+CETS will autotransform your code to safety at a 10-40% performance hit. A typed assembler language like TALx86 or custom one from Hyde's HLA would give you lower level than C with more safety ironically. So, many options for OSS to build on with some like Pascals having mature tooling.

EDIT: Just remembered the Pike programming language used in Roxen web & app servers as a C alternative. They're FAST. So, do google it.

Wow, thank you for that reply!

I don't think that C stays alive because people just automatically rejected anything else. There is something to C that I appreciate extremely, and that is the clarity and simple semantics: there is not much hiding and unintentional obfuscation that one can cause when writing C code.

Maybe safety and simplicity are mutually exclusive if speed and fine-grained control over memory are the main goals.

For Modula, the tools and documentation are not being updated anymore, I could not find a programming environment for eg., emacs and on armv6.

Free Pascal and Lazarus are great, until I realized there were no resources to learn modern pascal from. The emacs mode is too basic and doesn't take any advantage of things offered by fpc. But I have to admit, I've never seen anything as good as Lazarus before.

D is more like a C++ successor that's aiming at Java than a successor to C. It is complex and does not seem to be making any effort in unifying its concepts: in the same spirit of C++.

Cyclone is definitely interesting, but unmaintained and not exactly simple, similarly to Rust but with a C-like syntax.

Julia crashes all the time with segmentation faults, sorry, the quality of Julia is super low. I would not use it for anything serious. It has great ideas though.

OCaml is super complex with its syntax and semantics.

I love Haskell, which makes great efforts to unify its concepts: the language is super simple (its implementation can be arbitrarily complicated). Haskell is where I go to write beautiful things. Fast (like, systems programming fast) Haskell code is ugly and unmaintainable and defeats the purpose of choosing Haskell in the first place.

I'll be taking a look at the rest! Thanks!

> crashes all the time with segmentation faults

We would very much appreciate any bug reports, even if you don't have the time to reduce (if you have filed under another username, thank you!).

> Julia crashes all the time with segmentation faults, sorry, the quality of Julia is super low. I would not use it for anything serious.

I've seen a lot of Julia used in real world scenarios at companies and this is a pretty surprising claim. Segfaults are very rare – much rarer than in code written in C or C++. You may either have a messed up build of Julia or you could be using packages that call C libraries and do so incorrectly – that would certainly cause segfaults.

Sorry for not being clear on this. I was not talking about Julia code crashing. What I meant was when playing around with it, the julia binary would segfault here and there.

echo "print(3)" > boot.jl

julia --compile=all -O --inline=yes --check-bounds=no --output-o foo.o

Stefan can tell me if my hunch is right but check-bounds=no might turn off protections against out-of-bounds, memory access. If so, then that command is straight up telling it to segfault.
"There is something to C that I appreciate extremely, and that is the clarity and simple semantics: there is not much hiding and unintentional obfuscation that one can cause when writing C code."

It is pretty straight-forward. However, Modula-2 and Pascal's are even more so as you can represent everything with a simple BNF grammar. There's more consistency, less corner cases, and so on. Every change they make is specifically designed to improve it while maintaining simplicity. The most complicated and modern one can be described fully in around 30 pages [1]. The uppercase and declarations throw people sometimes but keep in mind people used text editors w/out syntax highlighting.

http://www.oberon.ch/pdf/CP-Lang.pdf

re mutually exclusive. Nah, there's still tradeoffs you can make. Cyclone and Rust both do that with better design decisions to allow safe, manual management. At some point, you have to choose one or the other though. Your language design matters at this point where the compiler has to know it can remove a check. C's design is so rough and allows so much unpredictable behavior that this is an ongoing research problem for it. Whereas, it happened with Modula-3 libraries in one academic project and happened for Ada with SPARK.

Re Modula. Oh yeah, it's not updated anymore. I was just illustrating a language as lean and efficient as C that was safer & easier to compile. A better foundation. It was rejected by C users when it did have compilers. Plus, there's a Modula-2 to C compiler floating around the net.

Re Free Pascal and Lazaurus. Interesting feedback. I'll look into that to see if I or developers could remedy it. Except for Emacs: Lazarus is the IDE and better suited. Emacs support might stay dead. Re D. Yes, it's more like C++ but can be used where C is many times. I see you're looking for simple stuff, though. Re Cyclone. It's not simple but not hard either. Remember C is deceptively simple: using it safey is HARD. Cyclone is slightly more complex but way easier to use. Unmaintained, yes, but people (esp GCC or LLVM types) could pick it up any day... if they weren't glued to C. Good call on the Rust connection.

https://doc.rust-lang.org/reference.html#appendix-influences

re Julia. That's not good... re Ocaml. I think you're focusing too much on complexity of semantics vs complexity of effective use. It's worthwhile to increase learning curve a bit to boost productivity, safety, and maintenance. re Haskell. That reaction surprised me and you're the first to call it simple. Ive been holding off cuz it seemed ridiculously hard and different. What did you use to learn it? Btw, I agree fast Haskell is usually ugly but look up Tolmach's Habit programming language. It's on hold right now cuz he had a better project.

Thanks :) You've given me a lot to go through over the weekend.

Regarding Haskell, I used LYAH and the standard library documentation.

I agree that Haskell is not easy to learn and that the libraries require a lot of foundation before becoming understandable and/or usable (eg, Monoids -> Applicatives/Monads vs. "I just want to read the contents of this file into a string"). The language "proper" is built on extremely simple and straightforward concepts.