Hacker News new | ask | show | jobs
by Jweb_Guru 4293 days ago
I definitely understand where you are coming from, and I agree that C++11's ecosystem maturity is a great reason to choose it at the moment.

However, I cannot agree with you that Rust's safety guarantees are not useful for most C++ programs, or that you have to be an "idiot" to do memory-unsafe things in C++11. Someone at Yandex recently did a presentation about Rust [1] in which they pointed to a bit of (completely idiomatic!) C++11 code that caused undefined behavior. The audience, full of seasoned C++ and Java developers, was asked to identify the problem. Not one of them could point to what was causing it (the compiler certainly didn't). The next slide demonstrated how Rust's compiler statically prevented this issue. The issue could have taken weeks to surface and days to track down, and the C++ compiler simply didn't have enough information to determine that it was a problem. This is something that happens over and over to anyone using C++, in any application, not just security-critical ones.

I'm not saying C++11 doesn't improve the situation, because it does--it would be disingenuous to say otherwise. But it's equally disingenuous to imply that C++11 makes memory management straightforward or safe. It does not.

[1] http://habrahabr.ru/company/yandex/blog/235789/ (note: the presentation and site are in Russian).

1 comments

> Someone at Yandex recently did a presentation about Rust[1] in which they pointed to a bit of (completely idiomatic!) C++11 code that caused undefined behavior.

It would be great to have at least that segment of the talk translated. Sounds like a good example.

The example:

  std::string get_url() { 
      return "http://yandex.ru";
  }

  string_view get_scheme_from_url(string_view url) {
      unsigned colon = url.find(':');
      return url.substr(0, colon);
  }

  int main() {
      auto scheme = get_scheme_from_url(get_url());
      std::cout << scheme << "n";
      return 0;
  }
Can you say what is the problem here? What is a string_view?
A non-owning pointer into string memory owned by someone else (effectively a reference into some string). AIUI, the problem is the temporary string returned by get_url() is deallocated immediately after the get_scheme_from_url call, meaning that the string_view reference `scheme` is left dangling.
The string_view pattern is a pretty bad idea and useless with a decent compiler.
What do you mean by useless?

If I have a string like "foo bar baz" and I want the second word, should I copy out that data into a whole new string? That seems rather inefficient.

(How is a compiler going to optimise that away?)

For small strings, a copy is not only faster but more multithreading friendly.

Keep in mind that on a 64-bit architecture a view is at least 16 bytes large and that small strings can be copied to the stack resulting in better locality and reduced memory usage.

Last but not least, with copy elision, your temporaries might not even exist in the first place.

Example:

    std::string data;
    // ...
    auto str = data.substr(2, 3);
    // pretty sure str will be optimized away
    if (str[0] == 'a')
I don't think copy elision[1,2] means what you think it means, it simply allows the compiler to avoid e.g. allocating a new string when returning a string, or avoid allocating a new string to store the result of a temporary. That is, copy elision allows

   std::string str = data.substr(2, 3);
   return str;
to only allocate one new string (for the return value of substr), instead of two. There's no way the compiler can get out of constructing at least one std::string for the return value, especially if there's any form of dynamic substr'ing (e.g. parsing a CSV file with columns that aren't all the same width).

Sharing is only multithreading unfriendly if there's modification happening, and modification of textual (i.e. Unicode) data is bad practice and hard to get right, since all Unicode encodings are variable width (yes, even UTF-32, it is a variable width encoding of visible characters).

Furthermore, a string_view is strictly better than a string for many applications, since a string_view can always be copied into a string by the caller if necessary (i.e. each function can choose to return the most sensible/most performant thing, which is a string_view if it's just a substring of one of the arguments).

The only sensible argument against string_view in C++ I know is: it's easy to get dangling references. Which is correct, but that's a general problem with C++ itself, not with the idea of string views (Rust has a perfectly safe version in the form of &str, which cannot become dangling like in C++).

> Keep in mind that on a 64-bit architecture a view is at least 16 bytes large and that small strings can be copied to the stack resulting in better locality and reduced memory usage.

No, a string_view points into memory that already exists, there's no increased memory usage; a small string copied on to the stack will be part of the string struct, which is at least 3 * 8 = 24 bytes: a pointer, the length and the capacity. Also, a memcpy out of the original string is always going to be more expensive than just getting the pointer/length (or pair of pointers) for a string_view, since the memcpy has to do this anyway.

[1]: http://en.wikipedia.org/wiki/Copy_elision

[2]: http://definedbehavior.blogspot.com/2011/08/value-semantics-...

The slides on Slideshare are surprisingly easy to follow (and were a good introduction for me): http://www.slideshare.net/yandex/rust-c
See slide 42 of STL's recent talk for what I guess will be a similar example: https://github.com/CppCon/CppCon2014/tree/master/Presentatio...
Reproduced here:

  const regex r(R"(meow(\d+)\.txt)");
  smatch m;
  if (regex_match(dir_iter->path().filename().string(), m, r)) {
      DoSomethingWith(m[1]);
  }
- What's wrong with this code?

  - Haqrsvarq orunivbe va P++11
  - Pbzcvyre reebe va P++14
  - .fgevat() ergheaf n grzcbenel fgq::fgevat
  - z[1] pbagnvaf vgrengbef gb n qrfgeblrq grzcbenel
(http://rot13.com/ 'd if you want to guess.)
Seems like this was fixed in C++14 by adding a std::string&& overload.

http://en.cppreference.com/w/cpp/regex/regex_match

The underlying problem is still there, fixing a few of the worst cases in the standard library is helpful but only up to a point. (E.g. anyone with a custom function that does something in a similar vein needs to remember to do the same.)