Hacker News new | ask | show | jobs
by Manishearth 3542 days ago
One interesting thing about Rust is that none of the language features are really new. Even the borrow checker is from research papers and languages from quite a while ago. The only thing I can think of that might be truly unique to Rust is the concurrency safety model (Send+Sync), though that might be old too.

Rust has just managed to take all these features and put them together well, and strive to be more than a research language by working on things that would make others actually use the language.

(Of course, this particular feature is common in many, many languages)

7 comments

Yes, worth noting that this was an initial goal of Rust: to use research that was rarely implemented, but established and non-novel among CS academics. See http://tim.dreamwidth.org/1784423.html -- I remember some comment about this being part of the choice of the name, too, but I can't find anything specific or definitive about that.
The name "Rust" has so many origin stories; IIRC the creator kept changing the story each time he was asked :)
> Even the borrow checker is from research papers and languages from quite a while ago.

Eh, that's underselling Rust's contributions. Rust is more flexible than anything I know of when it comes to enforcing aliasing-xor-mutability. Cyclone for example was much more restrictive in disallowing aliasing (see Grossman's "Existential Types for Imperative Languages").

The key feature that Rust has is flow-sensitive permissions on unique loan paths, which is actually pretty novel as far as I'm aware.

I guess, yeah. I meant that what Cyclone does is along the same lines.

> The key feature that Rust has is flow-sensitive permissions on unique loan paths, which is actually pretty novel as far as I'm aware.

Huh, right.

Indeed. Regions are 20 years old. Substructural types are 30 years old. Type inference is 40 years old. Hierarchical module systems fancier than Rust's are at least 30 years old. With the exception of regions, all this stuff is older than I am, myself. But Rust took all these isolated good ideas and made a practical programming language out of it.
Ada and Pascal are not really research languages, though. But still, there are improvements taken from research, see Ada 2012 contracts for example.
Right, like I said, this particular feature exists in a million languages. I'm talking about the whole set of features that Rust has; some are from research languages and may have never been seen in the industry before (e.g. regions/borrow checking), but they're not really new. This particular feature in Rust descended from the ML family (since Rust used to be ML-like), which in turn probably got it from Ada or w/e.
ML predates Ada by about 7 years, so that's unlikely. The way it is implemented in ML is most likely just an implementation of the mathematical concept of sum types rather than a feature influenced by existing programming languages.
> ML predates Ada by about 7 years, so that's unlikely.

Ha! I didn't know that. Thanks :)

sorry if this is a dumb question, but what classifies a language as a research language?
That's a great question!

I would probably restate it though. What is research vs. a product. I just read about 1nm transistors, but we're probably looking at 10 years before Intel, et al, have built all the infrastructure to reliably deliver a CPU based on it.

In the case of a language, I would say it's similar, do you have the support infrastructure in place? Rust is amazing in this regard: Cargo, crates.io, docs.rs, rustup, etc. on top of that you have at least one large company and many others pushing the language in a large and distributed product.

I would classify Rust as a production ready coding platform.

Thanks!

It seems very obvious now. I was thinking the purpose of the language was for performing research, but that didn't make much sense. I see now that it's the language itself that is the subject of research, not the tool. English is fun.

There are lots of languages in the world. The majority of them are written by a single person for themselves or (if there's funding involved) a small group of people who deeply understand the compiler. These are either research, hobbyist, or company languages depending on the context of the author. They exist to explore a particular idea or solve a specific problem and don't aspire to be widely used.

If you hear the term used disparagingly it's because they tend to have issues you generally wouldn't want to put up with when you're on the clock: compiler bugs, lack of error messages, large missing pieces in the standard lib, spectacularly bad performance, etc because those weren't the problems the language was meant to explore.

Yup, a lot of people like to say X has Y, however execution of Y is just as important(if not moreso) than the initial idea of Y. Technical ideas don't thrive in isolation, they need to be nurtured and grown.

Just look at the dominance of Javascript in the programming landscape, you're seeing decades of high quality executions of the language(I include community in the "execution" definition) despite it being far from the best technical language.

the concurrency safety model (Send+Sync), though that might be old too

Smalltalk has had that since the early 1980s

Like steveklabnik, I'm extremely curious to hear more about how Smalltalk's concurrency model is just like Rust's. The latter is a fairly flexible model to defend against data races (which seems to be a concept only really formalised in the late 80s) that puts most of the power in the programmer's hands (i.e. no need for compiler-inserted locks on every object, etc.) that comes from a finely balanced combination of the trait system and the manner in which Rust controls mutation.

It would be great to see how other languages have achieved a similar balance, and my impression is that Smalltalk (and like most languages) does not put nearly as much effort into controlling mutability, but maybe I'm wrong. Could you post some links/a description that explores how Smalltalk achieves a similar level of safety?

"which seems to be a concept only really formalised in the late 80s"

http://brinch-hansen.net/papers/1975a.pdf

http://brinch-hansen.net/papers/

His first, concurrent OS was RC 4000 in 1969. It had many mechanisms in place. He got the key parts of the safety problem figured out by 1972. His language to handle much of it statically at compile-time was done in 1975. His Boss 2 system same year used coroutines + similar concepts to run 100 activities at a time with a proof of deadlock freedom. Finally, he used Concurrent Pascal to implement Solo system which ran its processes safely without using physical, memory protection. These summaries came from "Evolution of Operating Systems" on 2nd link.

So, the stuff was well-established by the mid-70's with operating systems using it in production. Just ignored by mainstream like a lot of good stuff for various reasons. ;)

I don't have time to read it in full right now, but a quick glance over that paper doesn't show any formalisation of data races, which is specifically what that parenthetical is about, not just general "concurrency safety". Having a detailed description of concept seems important because Rust is fairly precise in what it defends against, possibly giving the programmer more flexibility and power than in a system that attempts to solve many problems (but of course giving them less assistance when writing something that fits within the rules of the more defensive languages). Of course, it's possible that a language "accidentally" solves that problem without it having been formalised, it just seems less likely.
It's in the disk buffer and monitor examples. Here's a relevant quote:

"A disk buÆer is a data structure shared by two concurrent processes. The details of how such a buÆer is constructed are irrelevant to its users. All the processes need to know is that they can send and receive data through it. If they try to operate on the buÆer in any other way it is probably either a programming mistake or an example of tricky programming. In both cases, one would like a compiler to detect such misuse of a shared data structure."

"To make this possible, we must introduce a language construct that will enable a programmer to tell a compiler how a shared data structure can be used by processes. This kind of system component is called a monitor. A monitor can synchronize concurrent processes and transmit data between them. It can also control the order in which competing processes use shared, physical resources."

Sounds like he understands the problem is concurrent processes stepping on each others toes using a shared, data structure. He has nice drawings and charts to go with it. Plus, an early model to solve it statically at compile time. Jweb_Guru noted some limitations of his method but it understands and solves the fundamental problem. Hansen's own work improved on it later plus it was obsoleted by things like Ravenscar, SCOOP and now Rust's method. Nice that he had a whole OS protected from concurrency errors at compile time in the 70's, though. :)

None of that looks like a formalisation or even a description of a data race, at least not in the modern sense.
> Just ignored by mainstream like a lot of good stuff for various reasons. ;)

This paper introduced Monitors. It was of course hugely influential.

Mutex and condition variables, the building blocks of monitors made their way into posix threads and from there into C++11.

I believe Java is a much closer realization of Hansen model as every java object is a Monitor.

Good point. I should qualify that statement better. Hansen's model was statically guaranteeing freedom from issues at compile time. There were projects aiming at that which could've implemented a similar model but didn't. More work could've gone into eliminating its limitations, etc. The Ada and Eiffel people carried the torch, though, with interesting progress. Now Rust is. Ada folks aren't resting, though, as ParaSail is pretty neat.
This system does not support recursion. It also does not have to reason about atomics and so on because it is not intended for multicore systems; indeed, it assumes that all access is serialized through a mutex, and doesn't consider the existence of types like single-threaded reference counters (like Rust's Rc). It is describing threads with exclusive access, and a scheduling mechanism for waiting on locks, but the mechanism is entirely dynamic (and the paper assumes the existence of a virtual machine, which is frequently invoked to deal with tricky cases).

Some relevant lines:

> A Concurrent Pascal compiler will check that the private data of a process only are accessed by that process. It will also check that the data structure of a class or monitor only is accessed by its procedures

is significantly less than what Rust provides, because:

> Processes cannot operate directly on shared data. They can only call monitor procedures that have access to shared data. A monitor procedure is executed as part of a calling process (just like any other procedure).

> If concurrent processes simultaneously call monitor procedures that operate on the same shared data these procedures will be executed strictly one at a time. Otherwise, the results of monitor calls would be unpredictable. This means that the machine must be able to delay processes for short periods of time until it is their turn to execute monitor procedures. We will not be concerned with how this is done, but will just notice that a monitor procedure has exclusive access to shared data while it is being executed.

Rust allows shared access to immutable data and does not require either serialization or the invocation of a virtual machine; it also allows nondeterministic operation to be used as long as it cannot cause memory unsafety.

Additionally, there are other, more basic things that the system is not capable of. For instance:

> (Strictly speaking, a compiler can only check that single monitor calls are made correctly; it cannot check sequences of monitor calls, for example whether a resource is always reserved before it is released. So one can only hope for compile time assurance of partial correctness.)

A major part of what Rust brings to the table is the ability to statically avoid problems like use after free and double free even in a concurrent setting. How would I avoid resource leaks in a system like this without a per-thread allocator (which would likely prevent sending across threads)?

Of course, it's not clear to me that this matters anyway, since I can't destroy a thread or shared data structure!

> Dynamic process deletion will certainly complicate the semantics and implementation of a programming language considerably. And since it appears to be unnecessary for a large class of real-time applications, it seems wise to exclude it altogether. So an operating system written in Concurrent Pascal will consist of a fixed number of processes, monitors, and classes. These components and their data structures will exist forever after system initialization.

I agree that people have a strong tendency to not know what past work has been done, and Concurrent Pascal is admirable, but what Rust does is not the same thing. Rust specifically tackles a lot of hard problems in concurrency that simply do not exist as long as you (1) assume away deallocation for shared data structures, and (2) serialize all access to those data structures.

(There are lots of other specific points I could go into; for instance, Concurrent Pascal apparently lacks facilities for generic programming, so I probably couldn't combine two correct concurrent data structures and expect a third correct concurrent data structure to come out. But that stuff isn't directly related to concurrency).

"I agree that people have a strong tendency to not know what past work has been done, and Concurrent Pascal is admirable, but what Rust does is not the same thing. "

I'm not saying it does. The claim I replied to said preventing things like data races was done till well into the 80's. I showed systematic investigations of the problem plus a production solution had been done by mid-70's. You've nicely showed how far ahead methods like Rust got. ;)

> The claim I replied to said preventing things like data races was done till well into the 80's

No, it definitely didn't: it said data races as a concept seemed to not be formalised until the late 80's.

my impression is that Smalltalk (and like most languages) does not put nearly as much effort into controlling mutability

VisualWorks got the ability to make certain objects immutable sometime in the early 2000's. The VM engineers were particularly proud of that one.

Smalltalk has had that since the early 1980s

That really depends on which Smalltalk you're talking about, I think. Various implementations really had different concurrency models. Some implementations used native OS threads. Others ran synchronously "inside the image" but spawned threads to interact with the OS.

that's interesting, AFAIK smaltalk is fully dynamic and has no compile-time type checking, so how would it statically enforce the equivalent of Send and Sync constraints?
In most implementations, Smalltalk is compiled to bytecode, and uses late binding. It's usually run on a JIT.

how would it statically enforce the equivalent of Send and Sync constraints?

In some Smalltalks, normal execution is synchronous. Many of them also use read and write boundaries for various purposes. The former gives you Send and Sync constraints for free. The latter can be used in difficult edge cases. (Like when you're calling out to the OS.)

> The former gives you Send and Sync constraints for free.

Uh, no, it gives you thread safety for free.

When my comment was talking about Send and Sync I was talking about the specific way Rust's typesystem enforces thread safety. I'm not saying that other languages don't, I'm saying that Rust's method of enforcing it might be one of the few unique things it does

Smalltalk is dynamically typed. How can it have a static type system feature?
(Yeah, just to clarify, in my comment above, I'm talking about systems similar to Rust's specific system for enforcing thread safety; not arbitrary systems that enforce thread safety in a different way)
I'd like to read more about this, do you have a recommendation of where I can look? I have a basic fluency in Smalltalk, but it's been a while.
Interesting, thanks!
Putting them well together in Rust is debatable.

Writing proper and safe concurrent code in Rust still looks horrible compared to languages which supports it natively in the type system, e.g. ponylang type capabilities. You still have to manually maintain locks, and it's also much slower.

There's a safety model, but it's only best practice, not enforced by the language nor the compiler. So calling it "safe" and "truly unique" is way off. Even parrot has a better, safe and lockless threading model, which guarantees safety.

> You still have to manually maintain locks, and it's also much slower.

You ... don't. That's just the concurrency model that the abstractions in the stdlib expose, but Rust's safety is generic enough that you can use different abstractions (e.g. lockfree ones). See crossbeam for some alternative models. There are also transactional memeory impls in Rust. They still use Send and Sync for safety though. Manually maintaining locks is a feature of the concurrency library used, not the safety model.

Pony's system is actually pretty close to that of Rust. Sync is "immutable", Send is "isolated" (sort of). Of course, capabilities are different from auto traits, but the idea behind using these two capabilities for concurrency safety is similar.

(So I was wrong that Rust's concurrency safety system is unique, since Pony has something based on the same concepts)

> There's a safety model, but it's only best practice, not enforced by the language nor the compiler.

Yeah, Send and Sync are technically a part of the stdlib (and can be reimplemented outside of it, aside from a small interaction with statics -- Send/Sync are treated as special by the compiler when it comes to statics). However, it is enforced by the compiler in the sense that if you avoid best practice (using Send and Sync), you can't write parallel code without dropping to `unsafe`. If you do that you can design safe abstractions around that and use whatever concurrency enforcement you wish, though pretty much everyone sticks to Send and Sync since it works well with the rest of the language.

(Because of the statics thing, "not part of the language" is debatable, anyway)

Having a language-imposed concurrency model would be useless and actively harmful in a system language. Rust provides the building blocks to build whatever safe abstractions are appropriate for the problem and domain at hand, plus a set of out of the box abstractions relatively low level that will be familiar to people coming from other system languages.

And of course the means to get rid of any abstraction and safety when required.

edit: accidentally a word

> Having a language-imposed concurrency model would be useless and actively harmful in a system language.

It works fine in Ada.

Edit: Why am I downvoted for this? A language-defined concurrency model is indispensable for having safe and deadlock-free concurrency in the language (rather than in arbitrary utility libraries). That's why concurrency was explicitly included into Ada and into the additional Ravenscar profile. The rationale for this has always convinced me and it also works fine in Ada, so anybody care to elaborate what's wrong with it?

It worked great for Concurrent Pascal and Solo:

http://brinch-hansen.net/papers/

I agree it's better to have mechanisms that can be turned into a proper model. A good default, though, greatly improves consistency and reliability in real-world systems.

Only if development of the language and OS are separate. If you co-develop the language with the OS, then it makes perfect sense to push safety features into the language (or conversely, to remove them from the language when they are no longer appropriate).

LISP-strength macros give you most of this capability.

OS are not the only use cases for system languages and developing a language to be tightly tied to an os (and viceversa) is a great way to condemn both to obscurity.
Erm... I think Dennis Ritchie would have had a few things to say about that.
good point; though, while designed to work well with each other, neither unix nor C require the other.
> There's a safety model, but it's only best practice, not enforced by the language nor the compiler.

What? How is it not enforced?

> You still have to manually maintain locks, and it's also much slower.

Slower? I don't think it can be any faster even in theory!