Hacker News new | ask | show | jobs
by pansa2 137 days ago
Yeah, you could argue that choosing C is just choosing a particular subset of C++.

The main difference from choosing a different subset, e.g. “Google C++” (i.e. writing C++ according to the Google style guide), is that the compiler enforces that you stick to the subset.

3 comments

C's string handling is so abominably terrible that sometimes all people really need is "C with std::string".

Oh, and smart pointers too.

And hash maps.

Vectors too while we're at it.

I think that's it.

When I developed D, a major priority was string handling. I was inspired by Basic, which had very straightforward, natural strings. The goal was to be as good as Basic strings.

And it wasn't hard to achieve. The idea was to use length delimited strings rather than 0 terminated. This meant that slices of strings being strings is a superpower. No more did one have to constantly allocate memory for a slice, and then keep track of that memory.

Length-delimited also super speeded string manipulation. One no longer had to scan a string to find its length. This is a big deal for memory caching.

Static strings are length delimited too, but also have a 0 at the end, which makes it easy to pass string literals to C functions like printf. And, of course, you can append a 0 to a string anytime.

Just want to off-topic-nerd-out for a second and thank you for Empire.
You're welcome!

One of the fun things about Empire is one isn't out to save humanity, but to conquer! Hahahaha.

BTW, one of my friends is using ClodCode to generate an Empire clone by feeding it the manual. Lots of fun!

I agree on the former two (std::string and smart pointers) because they can't be nicely implemented without some help from the language itself.

The latter two (hash maps and vectors), though, are just compound data types that can be built on top of standard C. All it would need is to agree on a new common library, more modern than the one designed in the 70s.

I think a vec is important for the same reason a string is… because being able to properly get the length, and standardized ways to push/pop from them that don’t require manual bounds checking and calls to realloc.

Hash maps are mostly only important because everyone ought to standardize on a way of hashing keys.

But I suppose they can both be “bring your own”… to me it’s more that these types are so fundamental and so “table stakes” that having one base implementation of them guaranteed by the language’s standard lib is important.

why not std::string?
You can surely create a std::string-like type in C, call it "newstring", and write functions that accept and return newstrings, and re-implement the whole standard library to work with newstrings, from printf() onwards. But you'll never have the comfort of newstring literals. The nice syntax with quotes is tied to zero-terminated strings. Of course you can litter your code with preprocessor macros, but it's inelegant and brittle.
Because C wants to run on bare metal, an allocating type like C++ std::string (or Rust's String) isn't affordable for what you mean here.

I think you want the string slice reference type, what C++ called std::string_view and Rust calls &str. This type is just two facts about some text, where it is in memory and how long it is (or equivalently where it ends, storing the length is often in practice slightly faster in real machines so if you're making a new one do that)

In C++ this is maybe non-obvious because it took until 2020 for C++ to get this type - WG21 are crazy, but this is the type you actually want as a fundamental, not an allocating type like std::string.

Alternatively, if you're not yet ready to accept that all text should use UTF-8 encoding, -- and maybe C isn't ready for that yet - you don't want this type you just want byte slice references, Rust's &[u8] or C++ std::span<char>

If only WG14 added something similar to C.

Yes, SDS exists, however vocabulary types are quite relevant for adoption at scale.

It's a class, so it doesn't work in C.
Sure, but you can have a similar string abstraction in C. What would you miss? The overloaded operators?
Automatic memory accounting — construct/copy/destruct. You can't abstract these in C. You always have to call i_copied_the_string(&string) after copying the string and you always have to call the_string_is_out_of_scope_now(&string) just before it goes out of scope
The C++ std::string is both very complicated mechanically and underspecified, which is why Raymond Chen's article about std::string has to explain three different types (one for each of the three popular C++ stdlib implementations) and still got some details wrong resulting in a cycle of corrections.

So that wouldn't really fit C very well and I'd suggest that Rust's String, which is essentially just Vec<u8> plus a promise that this is a UTF-8 encoded string, is closer.

Yeah, WG14 has had enough time to provide safer alternatives for string and arrays in C, but that isn't a priority, apparently.
Add concurrency and you more or less came up with same list C's own creator came up when he started working on a new language.
And constructors and destructors to be able to use those vectors and hash maps properly without worrying about memory leaks.

And const references.

And lambdas.

C is not a subset of C++, there are some subtle things you can do in C that are not valid C++
It is when compared with C89, also the ISO C++ requires inclusion of ISO C standard library.

The differences are the usual that occur with guest languages, in this case the origin being UNIX and C at Bell Labs, eventually each platform goes its own merry way and compatibility slowly falls apart with newer versions.

In regards to C89 the main differences are struct and unions naming rules, () means void instead of anything goes, ?: precedent rules, implicit casts scenarios are reduced like from void pointers.

Some subtle and some not so subtle.
Linters etc... Validates the subset you're choosing to use for your project too.