Hacker News new | ask | show | jobs
by bluejekyll 2372 days ago
I appreciate that you're trying to frame this as areas where all of these languages can be successful in these spaces, but for Rust in particular, this is an odd take:

"In order to be a system language AND also do everything that people want a system language to do (games, embedded, high performance) you need (I assert) to have rough edges and dangerous pit falls.

"Rust will eventually beat C/C++ on making web browsers and similar technologies because that's what it was built to do. However, Rust probably won't be able to beat C/C++ in game development and total OS development (although it can probably be partially used for both)."

Rust has all the necessary escape hatches (through unsafe) required for these spaces. There are people working in these spaces with Rust successfully, today. So, while, the other languages you mention might find success here as well, there is no reason (from a technical perspective) that Rust will not.

2 comments

The caveat here is that some correct software architectures require littering so much "unsafe" in the code (due to incompatible safety models, not actual unsafe-ness) that it largely defeats the purpose, and a software architecture that lets you avoid most "unsafe" produces a worse product while requiring more lines of code to accomplish the same thing.

Rust will always leave plenty of room for C++ to the extent that it tacitly encourages suboptimal software architecture for some types of applications, such as database engines, that commonly rely on safety models Rust was not designed to express.

I do see Rust potentially replacing a lot of backend Java, eventually.

That's all or nothing thinking. Enough people do it that what you say will probably happen. Thing is, one can always use multiple tools to achieve their goal. Anything Rust's safety model can't handle might be done with a different model, analyzer, etc. One recommendation I keep at is using "unsafe" Rust, porting it to identical C, throwing every tool we have in C ecosystem at it, and port what passes back to Rust with safe wrappers if possible. Rust couldn't prove it safe, it's externally proven safe (or safe enough), and optionally has protections during interactions via wrappers. You get Rust's benefits on everything else you code in the app plus whatever you include that others manage to get past borrow checker.

I call this general concept Brute-Force Assurance where you just modify the form of a program to fit existing tools to get their benefits. Just throw every sound and/or complete analyzer plus a lot of test generators at it. Also, code in a way that helps those tools wherever possible. If one can't, then use them on a version designed for verification first to get the algorithm right, step it toward optimized version, equivalence tests, repeat, etc.

I'm glad you commented because this is what I wanted to say, but couldn't think of a good way to lead into it.

IIRC while Rust does allow you to switch up allocators, it doesn't let you mix and match your runtime with multiple allocators. (Each artifact can be linked with at most one allocator at a time.) There are applications where you want to have multiple allocation strategies for performance reasons.

Uniqueness is really useful for most applications, but there are times that you want data structures that allow multiple pointers to the same data. Having to do this in Rust is going to be a bigger chore than doing it in a language that doesn't support uniqueness.

It will be possible for Rust to still participate in these areas (especially because you can use Rust for only part of your project, so you can use it where you don't have allocation or nonuniqueness constraints in your problem space). However, other options are going to offer a better programmer experience.

> IIRC while Rust does allow you to switch up allocators, it doesn't let you mix and match your runtime with multiple allocators. (Each artifact can be linked with at most one allocator at a time.) There are applications where you want to have multiple allocation strategies for performance reasons.

So, sort of yes, and sort of no. Like, you can swap the global allocator, but there's no way to parameterize standard library stuff over anything but the given allocator. But for your own code, you can write and use allocators however you want. Arenas are often popular, for example.

> I do see Rust potentially replacing a lot of backend Java, eventually.

I'm not quite so sure. Rust can be a good fit for services that need to be very optimized for memory or CPU usage. Java brings so many other extremely important benefits like introspection, management, profiling, hot swapping, and being generally more productive due to the fact that it's a GC'd language. Not to mention the huge ecosystem behind it.

> such as database engines, that commonly rely on safety models Rust was not designed to express.

Do you think you could expand on this?

From my experience Rust facilitates all the same operations as either C or C++, and generally without even needing to turn to unsafe. What I've found in my own (not DB, but networking) work, is that Rust generally asks you to restate the problem in a way that will allow it to be best expressed in Rust. This often differs from the down the middle of the road implementations people have grown used to in other languages, but it doesn't in any meaningful way prevent you from solving the problem, in a safe way.

There are a couple common architectural patterns that are easy to express in C++ but tend to violate assumptions in Rust's safety model. Database engines run into them frequently due to the nature of their core operations, and this ignores that most data structures are intrinsically globals (which Rust doesn't like) due to tight hardware coupling.

Rust assumes that all references to memory are visible at compile-time, and the safety analysis can be applied in cases where this is true with the usual caveats around borrow-checking. The "unsafe" facility is designed to interface with code written in other languages that don't respect Rust's model, and it works well for that. But how do you express the case, common in database engines (because direct storage-backed), where hardware can hold mutable references to most of your address space? There is no way to determine at compile-time if a mutable reference will be unique at runtime or to even sandbox it to a small bit of code. As a consequence, most memory reference are effectively mutable. There are workarounds that will minimize the quantity of unsafe code in Rust if you are willing to sacrifice performance and elegance.

In databases, having many mutable references to the same memory has few safety implications because ownership of memory is dynamically assigned at runtime by a scheduler that guarantees safe access without locking or blocking. This safety model solves the hardware ownership problem, which is why it is used, but it also enables quite a bit of dynamic optimization even if all your references are in software so you'd want to do things this way anyway. In C++, you can make all of this largely transparent on top of explicitly mutable references to memory. Again, you can produce a minimally unsafe version of this in Rust but it is going to be significantly uglier and slower.

As more server software moves to userspace I/O and scheduling models (for performance and scale reasons) it will be interesting to see how this impedance mismatch problem is addressed in Rust.

Lest someone gets the wrong idea, Rust makes _mutable_ globals painful to work with; readonly is fine.

As for hardware DMA’able memory, it’s true that it adds friction to work with in Rust. But C or C++ would fall into the same boat - one would need to sprinkle volatile or atomics, as appropriate, to avoid the optimizer from interfering. In Rust, you’d need to do the same (ptr::{read,write}_volatile or its atomics).

I’m having a slightly hard time imagining a db where “most” of the address space is DMA’able. I’ve some experience with kernel bypass networking, which has its own NIC interactions wrt memory buffers, but applications built on top have plenty of heap that’s unshared with hardware. What’s an example db where most of the VA is accessible to hardware “arbitrarily”?

Also, regardless of how much VA is shared, there’s going to be some protocol that the software and hardware use to coordinate. The interesting bit here is whether Rust and its type system can allow for expressing these protocols such that violations are compile-time detectable (if not all, perhaps some). Any sane C++ code would similarly try to build some abstractions around this, but how well things can be encoded is up for debate.

When a typical Rust discussion ensues, it’s commonly implied (or occasionally made explicit) that “write X in Rust” == “write X in safe Rust”. And this is the right default. But I think any non-trivial system hyper optimized for performance will have a healthy amount of unsafe code. The more interesting question, to me at least, is how well can that unsafety (and the “hidden”-from-rustc protocol) be contained.

As for a db scheduler obviating the need for a compiler to arbitrate ownership, that’s certainly true to a degree. But, this comes back to the protocol I mentioned - the scheduler is what provided the protocol and so long as other components work via it, it can provide safety barriers (and allow for optimizations). But again, I don’t immediately see why Rust (with careful use of unsafe) couldn’t do the same. And afterall, everything is safe so long as things play by the (often unwritten or poorly so) rules. Once systems get big and hairy, it gets tougher to stay within the guardrails and that’s where getting assistance from your language/compiler can be very helpful.

Some of the Rust libs/frameworks for embedded/microcontrollers deal with hardware accessible memory and otherwise “unsafe” protocols, but I’ve seen some clever ways folks encode this using Rust’s type system.

You have some misconceptions about how this all works in real databases. Rust experts who have looked into porting these kinds of C++ database kernels have not been sanguine in my experience. This isn't a theoretical exercise, we need to minimize defects and maximize performance.

- All pointers are ordinary, the fact that the same memory can also be DMA-ed by the hardware is immaterial. You do need accounting mechanisms that let the code infer which objects in memory are at risk of being read/written by a DMA operation. No atomics or volatiles required in userspace. Modern database code is effectively single-threaded.

- Most of the address space in a database is DMA-able because most runtime structures in a database engine must be adaptively persistable to storage. There are various workloads that will force different parts of your runtime state to be pageable because they can overflow RAM while operating within design constraints. Unless you are assuming small databases, that complexity is inconvenient but necessary for robust systems.

- C++ is more expressive than Rust when it comes to making hackery like this transparent, most of which is resolved at compile-time in C++. Much of the mechanics can be taken care of with a pointer wrapper that heavily overlaps the semantics of std::unique_ptr, making the code quite clean and natural. Most code never needs to know how that magic happens. C++ compile-time facilities are currently far beyond Rust.

- You can formally verify the scheduler design, and we sometimes do, but actually implementing it efficiently in real code without the borrow-checker losing its mind is a separate concern.

As I originally stated, you can write such systems in Rust while managing the amount of unsafe code. You just wouldn't want to and there would be little to recommend it compared to the C++ equivalent since it would be objectively worse by most metrics.

It seems you’ve already made up your mind and nothing anyone says will change that :).

The volatile I mentioned isn’t due to concurrency of userspace threads, but to avoid the optimizer from eliminating read/write operations. If the src/dst of those memory ops is DMA’d memory touched by hardware, you’d need to do that. Has nothing to do with concurrency.

Capability to spill to disk is certainly needed, no argument. But “most” of the address space and “most” of the runtime structures? Can you elaborate? Is there an OSS example or some paper or any discussion of such a thing in the open?

You can have custom smart pointers in Rust just as well, and back them with your own mem allocator. While there are features in C++ not currently available in Rust, C++ facilities “far beyond” is hyperbole. How well do you know Rust? Genuine question.

Can you point to a concrete example of the "scheduler" pattern you're referring to? I'm not familiar with it, and it's not clear to me from your post how it works.
I might be wrong but I suspect the "schedular" in this instance is the thing granting write access.

Consider a SQL database engine.

Many threads/process can be accessing a SQL database at any given time.

But if two of those actors require write access to a same object, only one will be allowed access and all others are blocked.

So at this high level the concurrency is guaranteed by the engine, which then means at the low level the engine can safely assume the access is exclusive.

There's one escape hatch that Rust doesn't have: a hatch to escape its complexity. These days I do most of my programming in C++, and my first requirement from the language that will replace it for me is that it be simple rather than a shrine to accidental complexity. So I'm looking at Zig and liking what I see so far. I also think its approach to correctness is ultimately less disappointing than Rust's (as it is now) but that's a whole other discussion. Of course, these are personal preferences rather than some universal claims, although when I bet on a programming language I also care about future popularity, and complex languages tend to never gain more than small niche adoption. Anyway, Rust and Zig have such diametrically opposed design philosophies for low-level (AKA systems) programming, that it will be interesting to see their respective adoption dynamics. If, despite my prediction, Rust ends up being more popular, I'll probably prefer it to C++ and use it.
I'm a huge proponent of static typing systems. However, that being said, I always keep in mind that everything has a cost.

* Just because I'm more comfortable with types doesn't mean that everyone else is.

* Someone may want to do something in a type system which is well typed, but only in a different type system.

* Someone may want to do something in a a type system which is well typed, but which has some bad compilation characteristics for the given type system.

Even if Rust is objectively better, you still have to get used to the things about it that make it objectively better. And you have to keep up with the changes that are made to it. And you have to understand where those better things fail down (for example Non Lexical Lifetimes ... in which case you have to get used to the NLL acronym that people use).

A simpler language, even if objectively worse, can yield better results if it is used with discipline. Discipline that might be easier to hone with less things that need to be considered.

And sometimes better results don't actually matter because the goal isn't the best results tomorrow but reasonably adequate results today.

As, despite our efforts, we haven't been able to find big differences between different languages (considering reasonable choices for the appropriate domain) in any important bottom-line metrics, neither in research nor in industry, I don't think there's much point in even mentioning objective value. The only scientifically acceptable working assumption at this point is that language choice (with the caveat above) makes no significant objective difference. It's like saying, even if rum-raisin ice cream gives us the ability to see through walls I still prefer pizza; we have no reason to believe rum-raisin ice cream does that, so why even mention it? As far as we know, it's all about personal preference -- we have no reason whatsoever to believe that either Rust or Zig are objectively better or worse than the other -- as well as some easily observable secondary objective differences such as popularity.
From Derek Jones' references, I got this study that's about the best I've seen so far showing there is a difference:

http://archive.adaic.com/intro/ada-vs-c/cada_art.pdf

I'll also add that Rust can give you both memory safety and race freedom at compile time. If you debugged heisenbugs, then you know that's a huge benefit. On Lobsters, one guy mentioned being hired for (a year?) to find and fix one in a system. Eiffel's SCOOP had a similar benefit. Languages such as Chapel made parallelism super easy in many forms vs C++ and MPI. Used judiciously, macros can eliminate tons of boilerplate. Erlang's strategy for error handling might go in this list if reliability is a goal.

There's been quite a few examples were a difference choice in language design eliminates entire classes of problems with anything from no effort to significant effort by developer. Increased velocity with fewer bugs during feature integrations and maintenance are provably-beneficial metrics for a business. I think we can say there's scientific evidence of actual benefits from language choices which have potential benefits if used in business. I just can't tell how far, if any, you'll get ahead by using them since there's non-language-design factors to consider that might dominate.

There is a reason why Ada continues to be used in safety critical systems--it works. More bugs are prevented and problems are detected earlier than they would be in a more lax language such as C or C++.

The large uptake and excitement around Rust shows that there are many C and C++ programmers who appreciate the safety guarantees that it provides. The popularity of Rust has actually created a resurgence of interest in Ada and each language has benefitted from the other.

For example, Spark, a well-defined subset of the Ada language intended for formal verification of mission-critical software, is adopting safe pointers that were inspired by Rust (source: https://blog.adacore.com/using-pointers-in-spark).

I would not be surprised if Rust also adds features based on ideas from Ada.

This is good. The "fast and loose" qualities of C and C++ allow far too many errors and security vulnerabilities in software. We have better tools. We just need to use them.

The syntax of Rust with all the different kind of references looks far too complex, just to get a performance improvement.

Garbage collectors provide memory safety without any special syntax.

Or in Delphi all strings are reference counted. They are mutable, if the ref count is 1, and immutable when it is larger than one. It is memory safe with safe aliasing and needs no special syntax.

A sufficient smart compiler could just optimize the reference counting away, and treat everything as immutable, unless it can prove there is no aliasing. Ideally, a language would only have two kinds of reference, mutable and constant, and everything is figured out by the compiler

There's only one kind of reference: &T

I guess Box<T> is special-typed but at the surface it just looks like a normal type

Yes, there's also a C pointer type, but it's for interop

I still think an improved C will have a better chance of unseating C.

But getting the C committee to act is about as hard as designing a new PL.

Let’s pretend for a moment that this SaferC exists. Is it also backward compatible with C? Will it require special keywords in the language, like unsafe, to call into original C? This would be the primary benefit, right?

Now, also, will this new SaferC also bring with it any of the other features people appreciate in Rust? Such as data race free code (because of the strong type/trait system and Send/Sync auto types), or match and let statements that support destructuring of types through pattern matching, or monomorphism for zero overhead polymorphism, or the simple to use tools around the language for managing dependencies, or async programming model that strips away all the complexity of hand written state machines?

For me, all of those features make Rust a modern 2020 language. I’m curious what a SaferC would have. And frankly, if it could exist, why hasn’t it been developed in the last 50 years?

No need to pretend, take a look at D as better C? [1]

[1] https://dlang.org/blog/2017/08/23/d-as-a-better-c/

Yes. That’s a great post, but betterC is not directly backward compatible with C. D also lacks many of the features that I’ve grown to like about Rust.

But, let’s say you’re right. D is SaferC—are the folks who are still waiting for a SaferC able to recognize it as such? Or, have they already decided that like Rust it doesn’t meet the criteria of the language that they’re waiting for?