Hacker News new | ask | show | jobs
by moffkalast 1075 days ago
Tbf, anything but bytes is a cloudy concept and not accepting that is just making things harder for yourself for literally no reason. If you have native arrays, you can have native strings, end of discussion.

C doesn't have strings is because there wasn't a well accepted library for them at the precambrian time it was being designed. Otherwise it would've, not out of some ideological fit-for-purpose nonsense. It was supposed to be a high level language, as high as they could go with systems being as crap as they were back then.

1 comments

> Tbf, anything but bytes is a cloudy concept and not accepting that is just making things harder for yourself for literally no reason.

What I mean is that strings are an abstract concept (roughly, a sequence of characters that can be indexed and possibly mutated) with many possible implementations and runtime characteristics. IMO, for C it wouldn't make much sense to prefer one over the other, other than the obvious statically allocated fixed size array where the compiler can already do all the work.

> If you have native arrays, you can have native strings, end of discussion.

C does have native arrays (of fixed size) and so does have native strings (of fixed size). It even has convenient syntax for instantiating these strings -- string literals. There is even syntax to concatenate them (at compile time) by simple juxtaposition in the source code.

> C doesn't have strings is because there wasn't a well accepted library

Nope, the reason is that "strings" (even if you assume their representation is strictly a tuple of size + pointer to contiguous character buffer) need storage of size that is unknown at compile time.

Storage needs to be managed, and the storage management can be either manually done or be done automatically by the language runtime. C opted to not have any of the latter except for stack variables and global variables (lifetime of process) -- and neither of those support dynamically sized allocations (exclude VLAs here).

If strings were simply an issue of library or syntax support, all the strings you could envision would exist by now. But even the best string libraries (for C) are a bad joke and do almost nothing to make string processing more ergonomic.

Honestly, I don't miss any of that, almost all of my string needs are well covered by printf/snprintf. Once I worked on my own text editor, and for this developped a text rope data structure that kept track of the line endings and the number of unicode code leader bytes etc, and supported efficient string operations at almost arbitrary sizes (easily gigabytes). Most languages probably don't give you this with their built-in "string" type.

What helps though, is to pick your tools according to the job. There is a lot of scripting work or web work that the language is simply not suited for. However for systems programming it's doing fine, giving you a lot of control while automating the most common headaches (register allocation, function calls, computation of data member offsets, etc).

> It was supposed to be a high level language, as high as they could go at the time anyway.

That is patently false. It was supposed to be a language that could allow them to program systems but more conveniently, and it was based on prior art. (There are certain figures on HN that I expect to immediately pop up and add that the art was already quite more advanced and "high level" at the time and C wasn't innovating on any front).

> What I mean is that strings are an abstract concept (roughly, a sequence of characters that can be indexed and possibly mutated) with many possible implementations and runtime characteristics.

Well alright maybe a string is more like a big int or big decimal than a long, but all of these are commonly used enough that they should be part of just about any language. Just treat it as an vector/arraylist that autoexpands to a the new length when it goes over the max and that's that. Maybe having hidden mallocs introduces unclear pitfalls, but frankly so do raw pointers and everything else C does anyway so it's not like it would be much worse in that regard.

> C does have native arrays (of fixed size) and so does have native strings (of fixed size). It even has convenient syntax for instantiating these strings -- string literals. There is even syntax to concatenate them (at compile time) by simple juxtaposition in the source code.

Yeaaah that's almost like a preprocessor thing though isn't it? Unless there's a separate type I've never seen they're still just a char* that's made at compile time.

> text rope data structure that kept track of the line endings and the number of unicode code leader bytes etc, and supported efficient string operations at almost arbitrary sizes (easily gigabytes). Most languages probably don't give you this with their built-in "string" type.

Well there certainly are esoteric cases where having full control can have major performance benefits, but one could always just revert back to a char array if it's actually needed and still have normal strings for 99.9% of daily use.

And well, Microsoft shipped the blazingly fast VSCode that's somehow made in javascript and runs on the molasses that is electron, while outperforming Sublime Text that's native C++. If done right, the VM/compiler should be smart enough to optimize these repetitive things according to best known practices instead of having to do it by hand over and over and over and over.... and probably messing up half the time.

> If strings were simply an issue of library or syntax support, all the strings you could envision would exist by now. But even the best string libraries (for C) are a bad joke and do almost nothing to make string processing more ergonomic.

"No way to fix this" says only language in existence where this is consistently terrible.

Well aside from assembly, but that one gets a pass since calling integers words is a special kind of madness that's not to be messed with.

> Well alright maybe a string is more like a big int or big decimal than a long, but all of these are commonly used enough that they should be part of just about any language.

There is no way to fit what you describe in C. A string needs storage and lifetime management -- not only do you have to create new strings, you also have to delete strings that become unreferenced. There is no way to wrap a nice syntax around this in C to just make temporary strings that get automatically cleaned up. You would have to introduce a dependency on a global heap allocator, and introduce reference counting or similar machinery, and C is simply not about doing that.

And with a more structured approach, that missing syntax doesn't hurt that much. It can feel good to know what lives where and how the storage is managed. If you don't like it, go look someplace else. But don't critique C for concentrating on more basic and essential abstractions.

> Unless there's a separate type I've never seen they're still just a char* that's made at compile time.

Compile time is what I said right? And it doesn't make a char-pointer but a fixed size char-array.

You can have what you want in C++ thanks to RAII, like std::string. Whether the result is worthwhile is another question.

> If done right, the VM/compiler should be smart enough to optimize these repetitive things according to best known practices

User inputs aren't performance sensitive at all. You have a human in front that's sending maybe a dozen Byte/s of data at peak. Any language can handle that.

For visual output you're sitting on top of a browser rendering engine that's highly optimized in C/C++/Rust etc. Billions of dollars have been put into it. It's still certainly possible to use the API (the DOM and CSS) in the wrong way to make it dog slow.

The efficiency of making modifications to the data model underneath is predicated on the selection of data structures. If those are wrong, it will be slow no matter the selection of language or VM/compiler.

The text buffer is certainly one central datastructure that has to be fast. https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r.... One thing to try is finding the "string" in there. See also the "Why not C++" section.

The fundamental type isn't what you're thinking of as "string" roughly C++ std::string

What's fundamental is the reference type, because that's the type you're going to use much more often, this is correctly a slice type, it refers to some bytes, I said above &[u8] is the least capability that's reasonable, and that's at last what std::string_view gives C++

This is one of the many fundamental choices Rust made that's much cleverer than it looks. The str type (some bytes which are promised to be UTF-8 text) is a language feature, a slight improvement on [u8] that's core to the language, however String is just a library type, albeit a very heavily optimized library type. A $1 micro controller might well have some use for &'static str, the immutable slice reference, e.g. to talk about some text baked into its firmware, but it doesn't have a heap allocator, it's not about to waste precious RAM on a dynamic allocator, and so it doesn't need String.

The talking point is a string type that can be used with some convenience. Let's say join multiple of them together at runtime with '+'. Or whatever, maybe just a function call. I was explaining why it doesn't work just like that.

The str type you mention is a nice feature, and in the future when I will have switched to Rust or whatever, I might use it to write my programs in 5 lines less.

While in C I have to use C's "slice" type: char x[] = "Hello". I know it's not quite as good since if I was to pass this around, I would have to make a pointer + length representation for this. If I needed it, it can be automatized from string literals: struct String(const char *buffer, size_t size); #define STRING(lit) (String){lit "", sizeof lit - 1}. Or char buffer[256]; my_api(buffer, sizeof buffer);

For the few situtations where string manipulation is required, it's just not a real problem.