Hacker News new | ask | show | jobs
by brabel 16 days ago
I just upgraded some code to Zig 0.16.0 and I am actually really happy with the results. It impacted A LOT of things, but the changes were actually very good and seems to have set the language for a bright future, especially with the new IO mechanism which allows supper efficient code that looks good whether it's implemented single-threaded, multi-threaded or just via an event loop!

If you haven't tried Zig since 0.16.0 was released, I highly recommend having a look. The release notes for this release were huge!!

https://ziglang.org/download/0.16.0/release-notes.html

3 comments

the “(super) efficient” is not there yet. Io is still dynamic dispatch with multiple layers of indirection. afaik it’s slower than before.

the upcoming releases are expected to provide a solution to this “dispatch is comptime-known, but still dynamic” problem, and drop the loses in efficiency.

Hmm in the 2025 talk ( https://youtu.be/f30PceqQWko?si=qZESxMaSyt7fYMfz ), Andrew emphasizes that this approach is more efficient than before- even showing compiled assembly iirc. I guess that was a one-off?
My guess is that one of these (Andrew) is measuring syscalls and the other is measuring vtable indirections.
A vtable indirection is essentially free when you're going to perform a syscall. What matters is that the buffer is above the vtable (which is already the case for the current implementation) so that you don't pay for the indirection when hitting the buffer.
Apples and oranges, yeah. I should have spelled that out more, thanks.
And if you write to a bytes buffer?
Wow that’s gnarly it’s using dynamic dispatch. I mean I get it, but I thought zig was some sort of performance demon.
if youre doing io, one pointer indirection seems unlikely tp be rate limiting. same for allocation (the other dynamic dispatch in zig)
It's not just I/O, it's also mutexes, condition variables, time, etc. It's not horrible, but it does add up, so calling it super efficient is a stretch.
A modern allocator with per-thread cache can satisfy some allocations in 20-30 cycles - dynamic dispatch can easily double that, even if the target is still in cache.

It's one of these things where it's extremely use case dependant - like many performance issues, you probably don't care about it - but when you do it matters.

Inderect call cost is a few cycles, if predicted. Now, you can argue, that it may be mispredicted and misprediction would cost about 20-30 cycles. But if it is mispredicted, then you are not calling into allocator often enough. And if you don't hammer it hard, then why do you care about preformance?
I believe their plan is using "restricted function pointers", where you can specify that a pointer will only ever be to a function defined in the codebase. I'm pretty sure they also have plans for devirtualization, but I haven't followed super closely.
"you can specify that a pointer"

i dont think you need to specify that. the compiler can figure it out and do an optimization pass at the end.

Oh, is it not a specific keyword? I thought they were thinking of it being a keyword so you could be sure that it was restricted, in case a variable or function was exported that took in a foreign pointer.
There are going to be builtins to control this. The compiler will not do it on its own.
The parent seems to be talking about efficient code style, not necessarily performance implementation, as they go on to discuss how it looks.

That is, I think the point was DevX not io performance.

> especially with the new IO mechanism which allows supper efficient code that looks good whether it's implemented single-threaded, multi-threaded or just via an event loop!

I had some trouble understanding how the async/await mechanism works:

  var foo_future = io.async(foo, .{args});
  defer if (foo_future.cancel(io)) |resource| resource.deinit() else |_| {}

  var bar_future = io.async(bar, .{args});
  defer if (bar_future.cancel(io)) |resource| resource.deinit() else |_| {}

  const foo_result = try foo_future.await(io);
  const bar_result = try bar_future.await(io);
My assumption is that calling io.async using an event loop implementation of IO, it will internally start a "task" (or whatever it should be called) and that the future is a handle to it. So far so good.

The part that I don't understand is what happens when you call future.await(io). Will the IO implementation somehow suspend the current function and resume once the future is resolved? If so, does that mean that every function in zig is a stackless coroutine?

> If so, does that mean that every function in zig is a stackless coroutine?

No and yes.

If you're using Io.Threaded, then the concurrency model is multithreading and calling Future.await will block your thread on a OS futex.

If you're using Io.Evented, then the concurrency model is green threads / fibers and calling Future.await will suspend the current green thread by yielding (swapping CPU state with another fiber).

Zig currently does not support stackless coroutines so today you can't have that, but we used to have them (pre self-hosted compiler), and there's an accepted proposal to bring them back, in which case any function that calls await, or that otherwise has a suspension point, would have to be transformed into a stackless coroutine by the compiler, yes. The plan is for that to happen transparently without requiring an `async` annotation in the function signature, like we already did in the past.

This is an old post of mine that explains how that worked at a high level: https://kristoff.it/blog/zig-colorblind-async-await/

> there's an accepted proposal to bring them back, in which case any function that calls await, or that otherwise has a suspension point, would have to be transformed into a stackless coroutine by the compiler, yes. The plan is for that to happen transparently without requiring an `async` annotation in the function signature, like we already did in the past.

If the compiler will treat functions that call a library function (Future.await) as special and change how the call site is compiled, why not just have an `await` keyword that when present will convert it into a state machine that can be suspended/resumed?

In other words: What is gained by not having a keyword that changes how a function is emitted if the compiler will change it anyway based on detection of a library call?

It's not Future.await that is special per se, it's that it (Future.await) will have in it somewhere either in its body or in another function that it calls in turn, a use of `suspend` (using old Zig syntax).

`suspend` is the keyword that best fits your description (you can think of it as being closely related to `yield` in other languages), and it's kind of a lower-level primitive compared to async/await.

This also means that, according to our plans, Zig will have to propagate "stackless-ness" upwards in the call chain while analyzing the code (thus making Future.await not special per-se).

> This also means that, according to our plans, Zig will have to propagate "stackless-ness" upwards in the call chain while analyzing the code (thus making Future.await not special per-se).

Very interesting, will be following Zig development more closely. Thanks for sharing!

For future reference you can format code on hn with a newline first then indenting each line by 2+ spaces. (Rather than triple tick)
Fixed it. thanks!
Maybe one day it'd be possible to use these new features, but I find myself using `.use_llvm = true` in my zig builds for stability and lesser tested targets.