Hacker News new | ask | show | jobs
by varajelle 1264 days ago
Regarding the sloc count, the default automated Rust formating tool is very eager to adds lot of lines by basically keeping only one word per line. Something I'm not a fan of, I must say.
2 comments

It usually does that on iterator chains, which AFAIK do not exist as such in C++, so multiple operations would be expressed as multiple imperative statements.

My C++ is rusty (no pun intended) but I struggle to imagine their variant of `vector.iter().map().collect()` to be as concise and fit in fewer than 4 lines.

I wonder if OP's C++ port doesn't use iterators that much, and how idiomatic it is.

EDIT: the code is not idiomatic at all.

> I wonder if OP's C++ port doesn't use iterators that much, and how idiomatic it is.

I think I only used iterators in places where there's no built-in function on slices like C++'s strchr and strspn. (I think Rust's str has these, but not [u8].) For example:

C++: https://github.com/quick-lint/cpp-vs-rust/blob/f8d31341f5cac...

    std::size_t length = std::strcspn(c, separators);
    if (c[length] == '\0') {
      return found_separator{.length = length,
                             .which_separator = static_cast<std::size_t>(-1)};
    }
    const char* separator = std::strchr(separators, c[length]);
Rust: https://github.com/quick-lint/cpp-vs-rust/blob/f8d31341f5cac...

    match s
        .as_bytes()
        .iter()
        .position(|c: &u8| separators.contains(c))
    {
        None => FoundSeparator {
            length: s.len(),
            which_separator: INVALID_WHICH_SEPARATOR,
        },
        Some(length) => {
            let found_separator: u8 = unsafe { *s.as_bytes().get_unchecked(length) };
            match separators.iter().position(|c: &u8| *c == found_separator) {
Of course it exists in C++, and has done since before Rust even existed.

Syntax is usually `vector | map | collect`.

> Of course it exists in C++, and has done since before Rust even existed.

Not in C++'s standard library until C++20.

Things don't need to be standardized in an ISO document to exist and be readily available.

I remember using it as early as 2008.

Wow, my C++ knowledge is even worse than I thought. I didn't know it had "pipelines".

https://en.cppreference.com/w/cpp/ranges

It's not "pipelines". It's just an overloaded bitwise-or operator.
> It usually does that on iterator chains, which AFAIK do not exist as such in C++, so multiple operations would be expressed as multiple imperative statements.

https://en.cppreference.com/w/cpp/ranges

Before C++20, similar functionality has been available in boost.

> the default automated Rust formating tool is very eager to adds lot of lines by basically keeping only one word per line.

This is not my experience.

Lifetime and '&mut self' noise (and four-space indentation) did cause rustfmt to sometimes split function signatures across multiple lines, but overall, I think rustfmt did a good job.

C++: https://github.com/quick-lint/cpp-vs-rust/blob/f8d31341f5cac...

    lexer::parsed_identifier lexer::parse_identifier(const char8* input,
                                                     identifier_kind kind) {
      const char8* begin = input;
      const char8* end = this->parse_identifier_fast_only(input);
      if (*end == u8'\\' || (kind == identifier_kind::jsx && *end == u8'-') ||
          !this->is_ascii_character(*end)) {
        return this->parse_identifier_slow(end,
                                           /*identifier_begin=*/begin, kind);
      } else {
        return parsed_identifier{
            .after = end,
            .normalized = make_string_view(begin, end),
            .escape_sequences = {},
        };
      }
    }

Rust: https://github.com/quick-lint/cpp-vs-rust/blob/f8d31341f5cac...

    fn parse_identifier(
        &mut self,
        input: *const u8,
        kind: IdentifierKind,
    ) -> ParsedIdentifier<'alloc, 'code> {
        let begin: *const u8 = input;
        let end: *const u8 = self.parse_identifier_fast_only(input);
        let end_c: u8 = unsafe { *end };
        if end_c == b'\\'
            || (kind == IdentifierKind::JSX && end_c == b'-')
            || !is_ascii_code_unit(end_c)
        {
            self.parse_identifier_slow(end, /*identifier_begin=*/ begin, kind)
        } else {
            ParsedIdentifier {
                after: end,
                normalized: unsafe { slice_from_begin_end(begin, end) },
                escape_sequences: None,
            }
        }
    }