Hacker News new | ask | show | jobs
by qazpot 1080 days ago
I wrote simple implementations (simple if/else/while and printing to stdout) with no optimizations in Rust, Python3 and C

Rust -> 23.2MiB/s

Python3 -> 28.6MiB/s

C -> 238MiB/s

Does anyone know why Rust's performance is in the same ballpark as Python3.

I thought it would be more closer to C.

6 comments

Rust’s print function locks by default (because of safety), C doesn’t. For more info see the Rust documentation: https://doc.rust-lang.org/std/macro.print.html

In order to get similar performance as C, you probably need to take care of this lock yourself:

    let mut lock = stdout().lock();
    write!(lock, "hello world").unwrap();
(And also you need to make the buffering size for stdout match C’s.)
> Rust’s print function locks by default (because of safety), C doesn’t.

Huh? Traditionally, stdio implementations have placed locks around all I/O[1] when introducing threads—thus functions such as fputc_unlocked to claw back at least some of the performance when the stock bulk functions don’t suffice—and the current ISO C standard even requires it (N3096 7.23.2p8):

> All functions that read, write, position, or query the position of a stream lock the stream before accessing it. They release the lock associated with the stream when the access is complete.

The Microsoft C runtime used to have a statically linked non-threaded version with no locks, but it no longer does. (I’ve always assumed that linking -lpthread as required on some Unices was also intended to override some of the -lc code with thread-safe versions, but I’m not sure; in any case this doesn’t play well with dynamic linking, and Glibc doesn’t do it that way.)

[1] e.g. see https://sourceware.org/git/?p=glibc.git;a=blob;f=libio/iofpu...

Thanks for the correction, I wasn't aware that the latest C11 standard made these functions thread-safe in the spec. (And as you've said implementations like glibc already have these locks)
Take a look at the actual implementation on stackexchange, the slower impl is already doing the locking itself.
I wrote this a long time ago, you might find it useful.

https://ismailmaj.github.io/tinkering-with-fizz-buzz-and-con...

Neat tricks. Beyond BufWriter (which I'm already using) and multthreading, I'm guessing there's not much to be done to improve my "frece" (a simple CLI frecency-indexed database) tool's performance without making it overly complicated. https://github.com/YodaEmbedding/frece/blob/master/src/main....
Thanks for writing this, led me to a rabbit hole.
C and Python have adaptive buffering for stdout: if the output is a terminal they flush on newlines, otherwise they only flush when their internal buffer is full.

Here's a C program counting, with a 1ms delay between lines. The second column is a duration since the previous read():

   $ ./out | rtss
   4.7ms    4.7ms | 1
   4.7ms          | 2
   4.7ms          | 3
   4.7ms          | 4
   4.8ms    exit status: 0
You can see they were all written in one go. When allocated a terminal, they come out line by line:

   $ rtss --pty ./out
   0.8ms    0.8ms | 1
   1.9ms    1.1ms | 2
   3.0ms    1.1ms | 3
   4.1ms    1.1ms | 4
   4.3ms    exit status: 0
Rust lacks this adaptive behaviour for output, and will always produce the second result, terminal or not.

Technically it unconditionally wraps stdout in a LineWriter (https://doc.rust-lang.org/std/io/struct.LineWriter.html), which always flushes if it sees a write containing a newline. To maximise throughput you therefore want to batch writes of multiple lines together, for example by wrapping it in a BufWriter.

You should compile rust with --release and C with -O3
That wouldn't be a fair comparison. Rust has an opt-level option for each build profile. It defaults to 2 for the release profile.
In practice O2 and O3 are rarely very different.
Almost certainly the limitation is due to printing, likely buffering or locking.
Can we see your code?