Hacker News new | ask | show | jobs
by jblow 1922 days ago
This entire article is nonsense. To a first approximation, the speed of your program in 2021 is determined by locality of memory access and overhead with regard to allocation and deallocation. C allows you to do bulk memory operations, Rust does not (unless you turn off the things about Rust that everyone says are good). Thus C is tremendously faster.

There is this habit in both academia and industry where people say "as fast as C" and justify this by comparing to a tremendously slow C program, but don't even know they are doing it. It's the blind leading the blind.

The question you should be asking yourself is, "If all these claims I keep seeing about X being as fast as Y are true, then why does software keep getting slower over time?"

(If you don't get what I am saying here, it might help to know that performance programmers consider malloc to be tremendously slow and don't use it except at startup or in cases when it is amortized by a factor of 1000 or more).

3 comments

> To a first approximation, the speed of your program in 2021 is determined by locality of memory access and overhead with regard to allocation and deallocation.

I wouldn't call that a first approximation. Take ripgrep as an example. In a checkout of the Linux kernel with everything in my page cache:

    $ time rg zqzqzqzq -j1

    real    0.609
    user    0.315
    sys     0.286
    maxmem  7 MB
    faults  0

    $ time rg zqzqzqzq -j8

    real    0.116
    user    0.381
    sys     0.464
    maxmem  9 MB
    faults  0
This alone, to me, says "to a first approximation, the speed of your program in 2021 is determined by the number of cores it uses" would be better than your statement. But I wouldn't even say that. Because performance is complicated and it's difficult to generalize.

Using Rust made it a lot easier to parallelize ripgrep.

> C allows you to do bulk memory operations, Rust does not (unless you turn off the things about Rust that everyone says are good). Thus C is tremendously faster.

Talk about nonsense. I do bulk memory operations in Rust all the time. Amortizing allocation is exceptionally common in Rust. And it doesn't turn off anything. It's used in ripgrep in several places.

> There is this habit in both academia and industry where people say "as fast as C" and justify this by comparing to a tremendously slow C program, but don't even know they are doing it. It's the blind leading the blind.

I've never heard anyone refer to GNU grep as a "tremendously slow C program."

> The question you should be asking yourself is, "If all these claims I keep seeing about X being as fast as Y are true, then why does software keep getting slower over time?"

There are many possible answers to this. The question itself is so general that I don't know how to glean much, if anything, useful from it.

> This alone, to me, says "to a first approximation, the speed of your program in 2021 is determined by the number of cores it uses" would be better than your statement. But I wouldn't even say that.

You chose an embarrassingly parallel problem, which most programs are not. So you cannot generalize this example across most software. When you try to parallelize a structurally complicated algorithm, the biggest issue is contention. I was leaving this out because it really is a 2nd order problem -- most software today would get faster if you just cleaned up its memory usage, than if you just tried to parallelize it. (Of course it'd get even faster if you did both, but memory is the E1).

> There are many possible answers to this.

How come so few people are concerned with the answers to that question and which are true, but so many people are concerned with making performance claims?

> You chose an embarrassingly parallel problem

Well, I mean, you chose an embarrassingly general statement to make? Play stupid games, win stupid prizes.

> which most programs are not

Programs? Or problems? Who says? It's not at all obvious to me that it's true. And even if it were true, "embarrassingly parallel" problems are nowhere close to uncommon.

> When you try to parallelize a structurally complicated algorithm, the biggest issue is contention.

With respect to performance, I agree.

> How come so few people are concerned with the answers to that question and which are true, but so many people are concerned with making performance claims?

The question is itself flawed. Technology isn't fixed. We "advance" and try to do more stuff. This is not me saying, "this explains everything." Or even that "more stuff" is a good thing. This is me saying, "there's more to it than your over-simplifications."

If you do not understand that "embarrassingly parallel" is a technical term and that it's generally understood that most programs are not easily parallelizable, there is not a discussion we can have here.
Really feels like you're just digging yourself deeper into a hole here:

Burntsushi began with "here, parallelization is beating out memory locality and optimization in its impact," but explicitly declined to generalize this the way you generalized your claim about memory.

He further pointed out that ripgrep is fast not just because of parallelization, but also because of how it handles memory.

Then you come back with "you can't always parallelize this well" (which burntsushi agreed with from the beginning) and "you also need to deal with memory" (which ripgrep does)? How is this burntsushi's problem with understanding "embarrassingly parallel" and not your problem with understanding Rust?

I agree that a discussion is difficult. Your comments are so vague and generalized that it's not clear what you're talking about at all. Bring something more specific to the table like the OP did instead of pontificating on generalities.
Thanks for making The Witness, Jonathan. It's one of my favorite games of all time and an exemplar of what it means to work through the consequences of logical axioms.

Makes me all the more sad that you're consistently unable to work through the consequences of Rust's axioms.

I don't disagree that memory access is nowadays critical for speed, but I haven't found Rust standing in the way of optimizing it.

As I've pointed out in the article, Rust does give you precise control over memory layout. Heap allocations are explicit and optional. In safe code. You don't even need to avoid any nice features (e.g. closures and iterators can be entirely on stack, no allocations needed).

Move semantics enables `memcpy`ing objects anywhere, so they don't have a permanent address, and don't need to be allocated individually.

In this regard Rust is different from e.g. Swift and Go, which claim to have C-like speed, but will autobox objects for you.

Bulk operations are not really about layout, they are about whether you mentally consider each little data structure to be an individual entity with its own lifetime, or not, because this determines what the code looks like, which determines how fast it is. (Though layout does help with regard to cache hits and so forth).
"mentally"?

I don't know what you're trying to imply that Rust does, but I'll reiterate that Rust lifetimes don't exist at code generation time. They're not a runtime construct, they have zero influence over what code does at run time (e.g. mrustc compiler doesn't implement lifetimes, but bootstraps the whole Rust compiler just fine).

If you create `Vec<Object>` in Rust, then all objects will be allocated and laid out together as one contiguous chunk of memory, same as `malloc(sizeof(struct object) * n)` in C. You can also use `[Object; N]` or ArrayVec that's is identical to `struct object arr[N]`. It's also possible to use memory pools/arenas.

And where possible, LLVM will autovectorize operations on these too. Even if you use an iterator that in source code looks like it's operating on individual elements.

Knowing your other work I guess you mean SoA vs AoS? Rust doesn't have built-in syntax for these, but neither does C that we're talking about here.

> They're not a runtime construct, they have zero influence over what code does at run time (e.g. mrustc compiler doesn't implement lifetimes, but bootstraps the whole Rust compiler just fine).

This kind of reasoning seems like it makes sense, but actually it is false. ("Modern C++" people make the same arguments when arguing that you should use "zero-cost abstractions" all over the place). Abstractions determine how people write code, and the way they write the code determines the performance of the code.

When you conceptualize a bunch of stuff as different objects with different lifetimes, you are going to write code treating stuff as different objects with different lifetimes. That is slow.

> If you create `Vec<Object>` in Rust, then all objects will be allocated and laid out together as one contiguous chunk of memory

Sure, and that covers a small percentage of the use cases I am talking about, but not most of them.

> When you conceptualize a bunch of stuff as different objects with different lifetimes, you are going to write code treating stuff as different objects with different lifetimes. That is slow.

This is not how lifetimes work at all. In fact this sounds like the sort of thing someone who has never read or written anything using lifetimes would say: even the most basic applications of lifetimes go beyond this.

Fundamentally, any particular lifetime variable (the 'a syntax) erases the distinctions between individual objects. Rust doesn't even have syntax for the lifetime of any individual object. Research in this area tends to use the term "region" rather than "lifetime" for this reason.

Lifetimes actually fit in quite nicely with the sorts of things programs do to optimize memory locality and allocations.

> Sure, and that covers a small percentage of the use cases I am talking about, but not most of them.

Fortunately the other stuff you are talking about works just fine in Rust as well.

I am talking about RAII. RAII leads to programs that are inherently slow.
You haven't really explained in any detail what is slow about "treating stuff as objects with different lifetimes", and specifically how Rust differs there from C. Can you give an example?

Maybe you'd be interested to hear that Rust's borrow checker is very friendly to the ECS pattern, and works with ECS much better than with the classic OOP "Player extends Entity" approach.

> (If you don't get what I am saying here, it might help to know that performance programmers consider malloc to be tremendously slow and don't use it except at startup or in cases when it is amortized by a factor of 1000 or more).

Rust is now getting support for custom local allocators ala C++, including in default core types like Box<>, Vec<> and HashMap<>. It's an unstable feature, hence not yet part of stable Rust but it's absolutely being worked on.

Arenas have been used in Rust for a long time.
Sure, but you are still going to be constrained greatly in terms of what those allocators are able to do, are you not?
In what way? What kind of constraints are you imagining here?
I guess I am confused by the question. The job of the borrow checker is to constrain what you are allowed to do, and it's well-understood that it constrains you to a subset of correct programs, so that you stay in a realm that is analyzable.
Sure, but the borrow checker only operates on references. Rust gives you the tools to work with raw everything, if you dip into unsafe. Memory allocators, doing this kind of low-level thing, don't work with references. Let's say you want to implement a global allocator (this is the only current API in stable Rust, non-global allocators are on the way). The trait you use, which gives you the equivalent of malloc/free, has this signature:

  unsafe impl GlobalAlloc {
      pub unsafe fn alloc(&self, layout: Layout) -> *mut u8;
      pub unsafe fn dealloc(&self, ptr: *mut u8, layout: Layout);
  }
Note the *mut u8 rather than say, &mut u8. Most people would not be using this interface directly, they'd be using a data structure that uses it internally.

Now, there's a good argument to be had about safe and unsafe, how much you need, in what proportion, and in what kinds of programs... but when you say things like "C allows you to do bulk memory operations, Rust does not" and ask about the borrow checker when talking about allocators, to someone who is familiar with Rust's details, it seems like you are misinformed somehow, which makes it really hard to engage constructively with what you're saying.

I'll try to further bridge some of the understanding gap.

People in this thread keep talking about "arena allocators" as if they are special things that you would use a few times (Grep Guy said this above, for example), or, here you imply they would be used internally to data structures, in a way that doesn't reach out to user-level.

That makes them not nearly as useful as they can be!

The game we are working on is currently 100k lines (150k lines if you count comments etc), and almost all allocations in the entire program are of this bulk type, in one way or another. Like if I want to copy a string to modify it a little bit to then look up a bitmap or make a filename, those are temporary allocations at user level and are not wrapped by anything. The string type is not some weird heavyweight C++ std::string kind of thing that wraps a bunch of functionality, it is just a length and a data pointer.

So the proposal to use unsafe in this kind of context doesn't make sense, since then you are putting unsafe everywhere in the program, which, then, why pretend you are checking things?

You can say, "well you as the end-user shouldn't be doing this stuff, everything should be wrapped in structures that were written by someone smarter than you I guess," but that is just not the model of programming that I am doing.

I understand how you can think the statement "Rust does not (allow you to do bulk memory operations)" is false, but when I say this, part of what I am including in "bulk memory operations" is the ability (as the end user) to pretend like you are in a garbage-collected language and not worry about the lifetime of your data, without having to take the performance penalty of using a garbage-collected language. So if you add back in worrying about lifetimes, it's not the same thing.

If you think "bulk memory allocation" is like, I have this big data structure that manages some API and it has some linked lists, and instead of allocating those nodes on the heap I get them from an arena or pool managed by the bigger structure ... that's fine, it is better than not doing it, but it doesn't help the end user write simpler code, and in practical terms it means that most of the allocations in the program are going to be non-bulk, because there's just too much friction on doing them broadly.

If it helps, I can revise my statement to "Rust enables you to do certain kinds of internal bulk memory allocation, but using bulk allocation broadly and freely across your program goes against the core spirit of the language" ... that sounds pretty uncontroversial? Then to bring it back to the original post, I would say, "This kind of broad use of bulk allocation is important for high performance and simplicity of the resulting code."

One last note, I am pretty tired of the "you don't understand Rust, therefore you are beneath us" line that everyone in the Rust community seems to deploy with even the slightest provocation -- not just when responding to me, but to anyone who doesn't just love Rust from top to bottom. Really it makes me feel that the only useful thing to do is just ignore Rust folks and go do useful things instead. I know I am not the only person who feels this way.