Hacker News new | ask | show | jobs
by ethin 498 days ago
No idea how much the author is experienced at Zig, but my thoughts:

> No typeclasses / traits

This is purposeful. Zig is not trying to be some OOP/Haskell replacement. C doesn't have traits/typeclasses either. Zig prefers explicitness over implicit hacks, and typeclasses/traits are, internally, virtual classes with a vtable pointer. Zig just exposes this to you.

> No encapsulation

This appears to be more a documentation issue than anything else. Zig does have significant issues in that area, but this is to be expected in a language that hasn't even hit 1.0.

> No destructors

Uh... What? Zig does have destructors, in a way. It's called defer and errordefer. Again, it just makes you do it explicitly and doesn't hide it from you.

> No (unicode) strings

People seem to want features like this a lot -- some kind of string type. The problem is that there is no actual "string" type in a computer. It's just bytes. Furthermore, if you have a "Unicode string" type or just a "string" type, how do you define a character? Is it a single codepoint? Is it the number of codepoints that make up a character as per the Unicode standard (and if so, how would you even figure that out)? For example, take a multi-codepoint emoji. In pretty much every "Unicode string" library/language type I've seen, each individual codepoint is a "character". Which means that if you come across a multi-codepoint emoji, those "characters" will just be the individual codepoints that comprise the emoji, not the emoji as a whole. Zig avoids this problem by just... Not having a string type, because we don't live in the age of ASCII anymore, we live in a Unicode world. And Unicode is unsurprisingly extremely complicated. The author tries to argue that just iterating over byes leads to data corruption and such, but I would argue that having a Unicode string type, separate from all other types, designed to iterate over some nebulous "character" type, would just introduce all kinds of other problems that, I think, many would agree should NOT be the responsibility of the language. I've heard this criticism from many others who are new to zig, and although I understand the reasoning behind it, the reasoning behind just avoiding the problem entirely is also very sensible in my mind. Primarily because if Zig did have a full Unicode string and some "character" type, now it'd be on the standard library devs to not only define what a "character" is, and then we risk having something like the C++ Unicode situation where you have a char32_t type, but the standard library isn't equipped to handle that type, and then you run into "Oh this encoding is broken" and on and on and on it goes.

8 comments

> typeclasses/traits are, internally, virtual classes with a vtable pointer

No, they're not. Rust "boxed traits" are, but those aren't what the author means.

> Primarily because if Zig did have a full Unicode string and some "character" type, now it'd be on the standard library devs to not only define what a "character" is, and then we risk having something like the C++ Unicode situation where you have a char32_t type, but the standard library isn't equipped to handle that type, and then you run into "Oh this encoding is broken" and on and on and on it goes.

The standard library not being equipped to handle Unicode is the entire problem. Not solving it doesn't avoid the issue: it just makes Unicode safety the programmer's responsibility, increasing the complexity of the problem domain for the programmer and leaving more room for error.

Not being able to easily write a program without Unicode being pulled in for Rust code was a reason I'd chosen C over Rust before. When targeting binary sizes measured in kilobytes, pulling in full unicode handling is not an option. Especially since programs that don't have direct human interaction rarely actually need unicode.
> The standard library not being equipped to handle Unicode is the entire problem

Zig: I want to be a safer C

C: I don't have string type

Zig: No… not like that!

> The standard library not being equipped to handle Unicode is the entire problem.

what? unicode is in the standard library.

https://github.com/ziglang/zig/blob/master/lib/std/unicode.z...

> In pretty much every "Unicode string" library/language type I've seen, each individual codepoint is a "character"

languages are actually really inconsistent on what they count as a unicode character: https://hsivonen.fi/string-length/

(I don't broadly disagree with you on unicode support, just linking an article relevant to that claim)

There is no nebulous 'character' type. There are bytes, codepoints and glyphs. All languages with Unicode support allow iterating over each for a given string.
> Zig does have destructors, in a way. It's called defer and errordefer.

defer ties some code to a static scope. Destructors are tied to object lifetime, which can be dynamic. For example, if you want to remove some elements from an ArrayList of, say, strings, the string's would need to be freed first. defer does not help you, but destructors would.

That's actually a great argument in favor of Zig over Rust. I assume Rust automatically writes code equivalent to this for you:

``` defer { for (list.items) |str| gpa.free(str); list.deinit(gpa); } ```

When it's spelled out like this, it becomes obvious to the reader that maybe this is the wrong allocation strategy. Maybe the whole thing should go in an Arena. Or, similarly, maybe there should be an ArrayList that holds all the character data that your string ArrayList indexes into with a u32 (or points to with pointers, if you want to update all the pointers on resize). Regardless, I'd be skeptical of code where each string has a separate lifetime even though all the lifetimes could be tied.

Rust makes classic (bad) allocation strategies automatic. Zig makes good allocation strategies more attractive than classic (bad) allocation strategies.

More succinctly: Rust makes bad code safe, Zig makes good code easy.

For me not having strings in Zig and being forced to use the fairly verbose '[]const u8' syntax every time I need a string was a little annoying at first, but it has had the effect of making me comfortable with the idea of buffers in a general sense, which is critical in systems programming. Most of the things that irked me about Zig when first learning it (I'm only a few weeks into it) have grown on me.
Typeclasses are conceptual interfaces. They don’t have anything to do with vtables.
Having just gone down this road in C#, the way Unicode is now handled is via "runes".

Each rune may be comprised of various Unicode characters, which may themselves be 1-4 bytes (in the case of utf-8 encoding).

The one problem I have with this approach is that all of the categorization features operate a level below the runes, so you still have to break them up. The biggest drawback is that, at least in my (admittedly limited) research, there is no such thing as a "base" character in certain runes (such as family emojis- parents with kids). You can mostly dance around it with the vast majority of runes, because one character will clearly be the base character and one (or more) will clearly be overalys, but it's not universal.

Go does this too. I generally like the idea a lot, as long as it's consistent. The one thing I don't like is the inconsistency.

Not sure about C#, but in Go for example ranging strings ranges over runes, but indexing pulls a single byte. And len is the byte length rather than rune length.

So basically it's a byte array everywhere except ranging. I guess I would have preferred an explicit cast or conversion to do that instead of by default.

Runes are how UTF-8 has been handled since its invention. It's just taken some platforms longer to get there than others.
I don't necessarily disagree with not having a string type in a low level language, but you seem very fixated on needing a character type. Why not just have string be an opaque type, and have functions to iterate over code points, grapheme clusters, etc.?