Hacker News new | ask | show | jobs
by jakewins 716 days ago
Man I get vertigo reading this. Reminds me of trying to understand Java constructors and object initialisation.

It’s been a while now, and at least in my experience so far Go and Rusts choice of not having special constructors really simplifies a lot.

Is there anyone that’s had the experience of missing constructors once you swapped away from them?

4 comments

There are a few somewhat esoteric cases where constructors working in-place allow magic which can be hard to replicate otherwise e.g. Rust is still missing guaranteed “placement new” type behaviour.

Unless you want to `ptr::write` individual fields by hand into a `MaybeUninit`, which you can absolutely do mind but that… is not very ergonomic, and requires structs to be specifically opted into this.

Which can be an issue if you want to initialize a 2MB large heap-allocated object (e.g. heap-allocating a large nested struct or a big array).

Without guaranteed “placement new” that can mean that your 2MB object gets constructed on the stack and copied to the heap. And while Linux defaults to a 4MB stack, Windows defaults to 1MB and will crash your program. Or it might work if the compiler optimizes in your favor.

It's not something you encounter frequently, it can be worked around, and Rust will eventually solve it ergonomically without introducing constructor hell (probably with just a keyword). But finding the best language-level solution isn't straightforward (efforts to fix this for rust are ongoing for 9 years)

>Which can be an issue if you want to initialize a 2MB large heap-allocated object (e.g. heap-allocating a large nested struct or a big array).

>Without guaranteed “placement new” that can mean that your 2MB object gets constructed on the stack and copied to the heap. And while Linux defaults to a 4MB stack, Windows defaults to 1MB and will crash your program. Or it might work if the compiler optimizes in your favor.

C gets a lot of hate, often for good reasons, but at least you know where your memory is coming from when you are allocating it yourself. If you're allocating a large heap-allocated object, you're grabbing the memory directly from the heap.

Memory allocation is one of the areas where currently C/C++ has or had genuine advantages over Rust. Custom allocators took Rust years, and giving standard library constructs like a Vector a custom allocator that isn't the global allocator is still experimental (=opt-in nightly-only). Similarly while Rust gives you good control over where the data ends up being stored, there is no way to make sure it isn't also put on the stack during function execution. One of the implicit assumptions underlying the language seems to be that the stack is cheap and effectively infinite while the heap is expensive. So you have a lot of control over what touches the heap, but less control over what touches the stack.

Those are temporary pains that have remedies in the works. Rust is a fairly young language, and a lot of good-enough solutions get thrown out before ever getting beyond the experimental stage. But if you are writing software today then needing absolute control over where exactly your data touches is a good reason to prefer C/C++ today. Not that that's a very common need.

I'm not persuaded that scribbling on a Box<MaybeUninit<T>> until it's initialised is less ergonomic than the C. Which isn't to say it's a desirable end state, I just don't see C as a more ergonomic alternative even for this application.
It can also be an issue if you want to wrap any API that requires fixed memory locations for objects (such as POSIX semaphores). It's UB to call a POSIX semaphore from any other memory location than where it was initialized, so making a `Semaphore::new()` API is just asking for trouble. You can deal with it by `Box`ing the semaphore, but then you can't construct the semaphore in a shared memory segment (one of the stronger use cases for process-shared semaphores).

I have a hunch this is why there's no Semaphore implementation in the Rust standard library, though it could be due to fundamental inconsistencies in semaphore APIs across OSs as well ¯\_(ツ)_/¯

No, Rust doesn't have semaphores in the stdlib[0] because it was not clear what precise semantics should be supported, or what purpose they would serve since by definition they can't mitigate exclusive and thus write access to a resource and mitigating access to code isn't much of a rust convention. And nobody has really championed their addition since.

Furthermore, they still present a fair amount of design challenges in the specific context of Rust: https://neosmart.net/blog/implementing-truly-safe-semaphores...

[0] technically they were there, added in 0.4, never stabilised, deprecated in 1.7, and removed in 1.8

> It’s been a while now, and at least in my experience so far Go and Rusts choice of not having special constructors really simplifies a lot.

This take makes no sense. Think about it: you're saying that not having the compiler do any work for you "really simplifies things a lot". Cool, so you have to explicitly declare and define all constructors. That's ok. But think about it, doesn't C++ already offer you that option from the very start? I mean, you are talking about a feature in C++ that is not mandatory or required, and was added just to prevent those programmers who really really wanted to avoid writing boilerplate code to lean on the compiler in and only in very specific corner cases. If for any reason you want the compiler to do that work for you, you need to be mindful of the specific conditions where you can omit your own member functions. For the rest of the world, they can simply live a normal life and just add them.

How is this complicated?

Complaining that special member functions make obvious things less simple is like complaining that English is not simple jus because you can find complicated words in a dictionary. Yes, you can make it complicated if that's what you want, but there is nothing forcing you to overcomplicate things, is there?

You're mistaken. Rust does not require you to define all constructors. Rust does not have constructors.

All structs in Rust must be initialized using brace syntax, e.g. `Foo { bar: 1, baz: "" }`. This is commonly encapsulated into static functions (e.g. `Foo::new(1, "")`) that act similarly to constructors, but which are not special in any way compared to other functions. This avoids a lot of the strangeness in C++ that arises from constructors being "special" (can't be named, don't have a return type, use initializer list syntax which is not used anywhere else).

This combined with mandatory move semantics means you also don't have to worry about copy constructors or copy-assignment operators (you opt into copy semantics by deriving from Clone and explicitly calling `.clone()` to create a copy, or deriving from Copy for implicit copy-on-assign) or move constructors and move-assignment operators (all non-Copy assignments are moves by default).

It's actually rather refreshing, and I find myself writing a lot of my C++ code in imitation of the Rust style.

> You're mistaken. Rust does not require you to define all constructors. Rust does not have constructors.

I don't think you managed to understand what I actually said, and consequently you wrote a whole wall of text that's not related to the point I made.

Your post starts with the flawed assumption that you have to define constructors in Rust, and then your own wall of text (ironically longer than mine) about avoiding boilerplate which doesn't apply to Rust. I'm not sure you understood my point.
> Your post starts with the flawed assumption that you have to define constructors in Rust (...)

I did not. Read what I wrote.

Just to further illustrate what I'm saying, are you really trying to say that

``` //explicitly annotating this struct is default initializable and copyable #[derive(Default, Copy, Clone)] struct Foo { ... } ```

is actually worse than

``` struct Foo {...}; // rule of zero, copy/move/default are defined/deleted based arcane rules predicated on the contents of Foo ```

> Just to further illustrate what I'm saying, are you really trying to say that (...)

If you read what I wrote you'll notice I was pointing out the absurdity of claiming that being forced to write each and every single factory method/constructor is somehow better and simpler than allowing the compiler to write them for us for trivia classes but still having the compiler step off when we opt to write each and every single factory method/constructor ourselves.

> How is this complicated?

Here are some ways:

- As a junior programmer, it made the language harder to learn. Language complexity increases super-linearly as each new feature has rules of interaction with several existing features

- Although one eventually learns to avoid the anti-features, you cannot control the actions of others. These features meant to help save keystrokes are happily employed every day, producing hard to read code

Particularly when writing library code for other to use or when maintaining large codebases shared by hundreds of engineers, my experience is that complex features in the language end up used by junior engineers or require consideration in API design.

> As a junior programmer, it made the language harder to learn.

Can you elaborate on your opinion? I mean, I don't think that argument makes any sense. You're talking about an optional feature that, under very specific circumstances, you can get the compiler to fill in for you default implementations for factory methods/constructors.

As a junior developer, it should be very clear to you that if you want to call a function, including copy constructor or copy assignment operators, you need to define them first. Is that too much of a burden to place on a junior developer?

> Although one eventually learns to avoid the anti-features (...)

There are none of those, and obviously special member functions don't pose a problem to anyone.

> Particularly when writing library code for other to use or when maintaining large codebases shared by hundreds of engineers, my experience is that complex features in the language end up used by junior engineers or require consideration in API design.

I don't think you have a good grasp on the subject. I've worked on C++ libraries shared by hundreds of engineers, and API design was always from the start the primary concern. This is not a function of seniority: it's the very basics of writing modular code intended to be consumed by third parties.

Still, special member functions are the very least of anyone's concerns because anyone remotely competent in this domain knows very well that the public interface needs to be explicitly designed to support or reject specific types of uses, and the decision of whether a component could/should be copied/moved is tied to the component's architecture and semantics.

dude, java constructor are easy... that C++ stuff is really black magic

and from what I understand rust constructors are basically the same as java, no?

Inside a constructor you can access a partially initialised "this" value, and even call methods on it, which leads to rules like: "Do not call overridable methods in constructors"[0], as they can lead to surprising, non-local, bugs.

Rust has functions associated with types which are conventionally used like constructors, but critically the new objects must have all their fields provided all at once, so it is impossible to observe a partially initialised object.

[0] https://learn.microsoft.com/en-us/dotnet/fundamentals/code-a...

Virgil solved this a little differently. The initialization expressions for fields (outside of constructors) as well as implicit assignment of constructor parameters to fields happens before super constructor calls. Such initialization expressions cannot reference "this"--"this" is only available in _constructor bodies_. Initializing fields before calling super and then the chaining of super calls guarantees the whole chain of super constructor calls will finish before entering the body of a constructor, and all fields will be initialized. Thus by construction, virtual methods invoked on "this" won't see uninitialized fields.

https://github.com/titzer/virgil/blob/master/doc/tutorial/Cl...

You can most likely use session types to soundly observe a partially initialized MaybeUninit<MyObject> in Rust. The proper use of session types could ensure that the object is only assumed to be initialized after every field of it has been written to, and that no uninitialized fields are ever accessed in an unsound way. The issue though is that this is not automated in any way, it requires you to write custom code for each case of partial initialization you might be dealing with.
Rust does not have constructors at all[0], it uses factory functions (conventionally named `new_somethignsomething`) but those are not special to the language.

[0] except in the more generalised haskell-ish sense that structs or enum variants can be constructed and some forms (“tuple structs” and “tuple variants”) will expose an actual function

I've often longed for first class constructors in Go and Rust. It was more of a problem for me with Go because you can omit a struct field when building a value, something you can't do in Rust unless it has an explicit Default impl and even then you have to explicitly add ..Default::defualt() when you're building the value.

I never thought that constructors were that burdensome and therefore do not understand the omission in other languages like Go and Rust that followed. Quite the opposite really -- knowing that a type always went through a predefined init was comforting to me when writing Java.

I think people don’t like constructors because of the potential side effects of something happening in constructors, especially if the constructor is big or doesn’t finish properly.
Rust doesn't have constructors. By convention, a static method called new returns a struct - no magic.
I think if you think constructors in Java are easy, you are much, much smarter than I am or have missed some really, really subtle footguns.

Eg:

- Java constructors can return the object before they complete construction, finishing at a later time; this is visible in concurrent code as partially constructed objects

- Java constructors can throw exceptions and return the partially constructed object at the same time, giving you references to broken invalid objects

- Just.. all the things about how calling super constructors and instance methods interleaved with field initialization works and the bazillion ordering rules around that

- Finalizers in general and finalizers on partially constructed objects specifically

I don't in any way claim it's on the same level as C++, but any time I see a Java constructor doing any method calls anymore - whether to instance methods or to super constructors - I know there are dragons

> bazillion ordering rules

There are 3 which pertain to object initialization in Java.

1. super is initialized in it's entirety by an implicit or explicit call to `super()`

2. All instance initializers of the present class are invoked in textual order.

3. Constructor code following the `super()` call is executed.

The only awkward thing here is the position of #2 in between #1 and #3, whereas the text of a constructor body suggests that #1 and #3 are consecutive. It gets easier to remember when you recognize that, actually, there's a defect in the design of the Java syntax here. A constructor looks like a normal function whose first action must be a `super()` call. It's not. The `super()` call is it's own thing and shouldn't rightly live in the body of the constructor at all.

Edit: Tweaks for clarity.

Those are the normal issues inherent to constructors as a concept (except for the finalizer one).

Any language that has constructors has some complex rules to solve those things. And it's always good to check what they are when learning the language. Java has one of the simplest set of those rules that I know about.

> - Java constructors can return the object before they complete construction, finishing at a later time; this is visible in concurrent code as partially constructed objects > > - Java constructors can throw exceptions and return the partially constructed object at the same time, giving you references to broken invalid objects

Java constructors do not actually return the object. In Java code, it would appear to the caller as though the contructor returns the new instance, but that is not really the case. Instead, the new object is allocated and then the constructor is called on the object in (almost) the same manner as an instance method.

Additionally, Java constructors can only leak a partially initialized object if they store a `this` reference somewhere on the heap (for example, by spawning a thread with a reference to `this`). The assertion that this gives you a reference to a "broken invalid object" is only potentially correct from the perspective of invariants assumed by user-written code. It is perfectly valid and well-defined to the JVM.

> - Just.. all the things about how calling super constructors and instance methods interleaved with field initialization works and the bazillion ordering rules around that

This is a gross mischaracterization of the complexity. There is only a single rule that really matters, and that is "no references to `this` before a super constructor is called". Until very recently, there was also "no statements before a super constructor is called".

> - Finalizers in general and finalizers on partially constructed objects specifically

Finalizers are deprecated.

I think you’re exaggerating the complexity here. There are corner cases yes, but the compiler will warn you about them.

    > Java constructors can throw exceptions and return the partially constructed object at the same time
Can you show some sample code to demonstrate this issue?
I mean go's zero initialization requires a bit of language lawyering sometimes too.

https://codefibershq.com/blog/golang-why-nil-is-not-always-n...