Hacker News new | ask | show | jobs
by steveklabnik 3739 days ago

  > it gets hard to limit the number of memory allocations/accesses
  > as well as data copies going on.
I would be interested in hearing more about this, if you have the time.
2 comments

In this video[0] on Rust, in this example I was unable to directly tell whether the Vec structure is being copied to the take() function. If ownership is being handed over, why is a copy being made? And if a copy really isn't being made, then why aren't the function signature and the calling site reflecting that?

I hope I am making any sense... Please tell me if I should try to rephrase, or if I am in need of clarification on the subject.

(BTW, this is also something I really dislike about C++, two function calls that look exactly the same, foo(x) and bar(x), could either be pass by value or pass by reference: you have to go dig up the function signatures to figure that out).

[0] https://youtu.be/O5vzLKg7y-k?t=1110

Gotcha. I thought the video would have explained this, but since it didn't get through, let me try :)

  fn take(vec: Vec<i32>)

In Rust, the default way that things operate is to _move_. So when you call take(), we'd say that the vector moves into the function.

Moving is always a memcpy. There's no way to change this behavior, so you can always know that it's true. But with a Vec, there's a subtlety: the Vec itself is three words: a pointer to the heap, a length, and a capacity. So when I say that the Vec is memcpy'd, I mean literally those three things. It doesn't try to follow the pointer and copy the data that it points to.

Incidentally, that pointer is why we say that it moves: because two copies of the Vec structure itself would mean an aliased pointer to the heap, the compiler doesn't allow the use of the older binding after take() is called.

For simpler types, like integers:

    fn take(i: i32)
they implement a trait called Copy. This means that when you call take, the i32 is copied, in the same fashion that the Vec was copied. The only difference is that there's no problem with having these two copies, as an i32 is, well, just an i32. So the compiler won't prevent the use of the binding outside the function like it would with a non-Copy type.

References are like pointers, but with the extra static checking that Rust does.

    fn take(vec: &Vec<i32>)
References also implement Copy, and so when you pass a reference to a Vec to this function, the same thing happens as in the i32 case. The reference itself gets copied.

This is sort of a long-winded way of saying that Rust is "pass by value", not "pass by reference." Some people like to call this "pass reference by value", since while it's always pass by value, references are themselves values.

What if we _did_ want to make a full copy of the Vec, including its elements? For that, we have to use the .clone() method.

    let v = ... // some Vec<i32>
    let v1 = v.clone();
Now, v and v1 will be full, independent copies of the same thing. In other words, if you don't see .clone(), it will never be a deep copy.

    let v = ... // some Vec<i32>
    let v1 = v; // a move, always a shallow memcpy
    let v2 = v1.clone(); // a deep copy
You can always see, syntactically, when you're making a possibly expensive copy of something.

Does that help? I'm happy to elaborate further.

It was a great detailed explanation that also illustrates why I've held off on Round 2 of reviewing the docs. I mean, I read this...

"This is sort of a long-winded way of saying that Rust is "pass by value", not "pass by reference." Some people like to call this "pass reference by value", since while it's always pass by value, references are themselves values."

...and instantly have to focus hard to make sure I'm not slipping on the basic concepts. The wording of many Rust descriptions seems unnecessarily confusing to the point that I just googled value vs reference semantics to make sure my memory wasn't screwed up [more]. In a normal 3GL, pass by reference basically stores a pointer in a variable and passes that value somewhere. That value/reference can be used to modify original data outside its original function. Is that what Rust does? If so, it's just pass by reference "with (caveats/rules here)." End of story. Otherwise, I'll see if I can guess a wording that doesn't merge opposite concepts.

This concept isn't Rust specific, it's a general PLT thing. Rust is the same as C in this regard. Almost every language is pass by value these days. IIRC Fortran is pass by reference, an exception.
Then just say it's pass by reference instead of stuff I quoted. Counter anyone else doing the same in your usual, gentle style reminding how unnecessary confusion hurts adoption. Ill get back on helping find more of this stuff in the docs once im done moving.

Btw, I tried to get on your draft but link didnt work by time I got back to it. Was it merged into official docs or where so I read/review the right thing?

I still failed, it's pass by value :( just like C.

http://github.com/rust-lang/book is the repo, and has a link to the rendered version.

Yes! That definitely reinforces my understanding. So Rust is more like C and less like C++ in that it is pass-by-value everywhere, with the exception that references are not pointers but _real_ references.

Does this make things like "const int * const x" (in C syntax) pointless in Rust?

Does that mean that in Rust I can only pass to a function an immutable reference to a mutable object, but not a mutable reference to an immutable object?

Also, out of curiosity, how would something like a recv() call into the middle of an array (buffer) look like in rust? Something like "recv(sock_fd, buf + 5 * sizeof(char), buf_len - 5 * sizeof(char), 0)"?

Great :)

  > Does this make things like "const int * const x"
  > (in C syntax) pointless in Rust?
Checking myself with cdecl, because I _always_ get this wrong:

  > declare x as const pointer to const int
This is

  let x: &i32;
(using i32 because it's the default int type, even though it's different than C's int.)

The variations are:

  let x: &i32; // an immutable binding to an immutable reference
  let mut x: &i32; // a mutable binding to an immutable reference
  let x: &mut i32; // an immutable binding to a mutable reference
  let mut x: &mut i32; // a mutable binding to a mutable reference
More mutablity == longer declaration, roughly.

  > Does that mean that in Rust I can only pass to a function an immutable reference
  > to a mutable object,but not a mutable reference to an immutable object?
You're right in both. No problem treating something mutable as immutable, but treat something immutable as mutable and you get a compiler error.

  > Also, out of curiosity,
Rust has a concept called "slices". These are "fat pointers", that have both a pointer and a length inside:

  let x = vec![1, 2, 3, 4, 5];
  let slice = &x[1..3];
Here, 'slice' will be a pair, (ptr, len), so the ptr will point to the interior of the vector, and the length will be 2. If we printed it, we'd see "2, 3".

So let's check out the signature of recv:

  ssize_t recv(int sockfd, void *buf, size_t len, int flags);
This pointer, length pair looks suspiciously like buf and len here. That's for good reason. A first step towards a more Rust-like wrapper over recv would look like this:

  fn recv(socket: libc::c_int, buf: &mut [u8], flags: libc::c_int) -> libc::ssize_t {
    let ptr = buf.as_mut_ptr() as *mut c_void;
    let len = buf.len() as libc::size_t;

    unsafe { recv(socket, ptr, len, flags) }
  }
which would end up being called like this:

  recv(sock_fd, slice, 0);
combining the names from your example and mine from the slice above, heh.

The next step would be to turn `flags` into an enum, and take that as an argument rather than a C int. Then you'd want to convert the return type to a Rust integer rather than a C one... eventually, you end up with the API we have in the standard library, which looks like this for a udp socket, for example:

  use std::net::UdpSocket;

  {
      let mut socket = try!(UdpSocket::bind("127.0.0.1:34254"));

      // read ten bytes from the socket
      let mut buf = [0; 10];
      let (amt, src) = try!(socket.recv_from(&mut buf));

  } // the socket is closed here
A few comments on this:

We only said &mut buf, but I showed you syntax with [] above. When you pass a reference to an array or vector, and the function is expecting a slice, it will automatically convert to a slice of the full length. To be more accurate to your original question:

      let (amt, src) = try!(socket.recv_from(&mut buf[1, 5]));
or whatever middle part of the buffer you want.

amt is the amount of bytes read, and the src is the address, in this particular API.

You'll notice this particular API doesn't expose flags: we try to Do The Right Thing in the standard library, so you don't worry about these. Here's the source of rec_from: https://github.com/rust-lang/rust/blob/master/src/libstd/sys... c::recvfrom is libc::recvfrom, rather than recv, technically. But as an example from a different API, when opening files, we set O_CLOEXEC where appropriate: https://github.com/rust-lang/rust/pull/27971

If you want the more exotic options, then you have to dig in and make the calls yourself. Such is the pain of a standard library, trying to make the common, good case good, but we still let you dig in and build a different abstraction if you don't like ours.

IIRC moving is a possibly-optimized memcpy, LLVM might remove it entirely.
Yes, I'm speaking purely semantically.
That statement didn't fully sink in first time I saw it. I'm also curious as to what that was referring to. Especially since Ada/SPARK are mainly used in real-time, resource-constrained systems.