Hacker News new | ask | show | jobs
by userbinator 2714 days ago
We have to print much more to the console due to the ANSI escape codes and we also have to do some conditional checks ON EACH BYTE in order to colorize them correctly.

A few extra comparisons and output for each byte shouldn't be that much slower; fortunately the function of this program is extremely well-defined, so we can calculate some estimates. Assuming a billion instructions per second, taking ~1.5s to hexdump ~1 million bytes means each byte is consuming ~1500 instructions to process. In reality the time above is probably on a faster CPU, so that number maybe 2-3x more. That is a shockingly high number just to split a byte into two nybbles (expected to be 1-3 instructions), convert the nybbles into ASCII (~3 instructions), and decide on the colour (let's be very generous and say ~100 instructions.)

The fact that the binary itself is >1MB is also rather surprising, especially given that the source (not familiar with Rust, but still understandable) seems quite small and straightforward.

2 comments

Rust binaries can be large because unlike C, the standard library is statically linked, as well as jemalloc. Jemalloc will no longer be the default as of the next release, so that will shave off ~300k...
What's replacing Jemalloc?
The system malloc implementation. Users who want to use jemalloc have to opt in, but doing so is relatively easy (using the jemallocator crate from crates.io).
Why was this done?

Did rust become less dependent on allocator performance, or did system allocators improve enough? IIRC glibc malloc has improved a lot over the last few years, particularly for multithreaded use, but I don't know about windows / macOS.

So, long ago, Rust actually had a large, Erlang-like runtime. So jemalloc was used. Over time, we shed more and more of this runtime, but jemalloc stayed. We didn't have a pluggable allocator story, and so we couldn't really remove it without causing a regression for people who do need jemalloc. Additionally, jemalloc was already removed on some platforms for a long time; Windows has been shipping the system allocator for as long as I can remember.

So, now that we have a stable way to let you use jemalloc, the right default for a systems language is to use the system allocator. If jemalloc makes sense for you, you can still use it, but if not, you save a non-significant amount of binary size, which matters to a lot of people. See the parent I originally replied to for an example of a very common response when looking at Rust binary sizes.

It's really more about letting you choose the tradeoff than it is about specific improvements between the allocators.

It seems I was wrong. The new hexyl version is significantly faster (see my other comment)