| > if a string literal contains only UTF-8 characters and you assign it to a byte array or span, it gets encoded as UTF-8. I write a bunch of C# for my job, but am far from an expert in the language. My reading of this statement is redundant, which means I feel sure it's trying to communicate something the authors thought was "obvious" and is not. * A string literal - so, realistically some Unicode text, right? All the other encodings anybody was actually using can transliterate to Unicode, so, they are just Unicode (with a different encoding) * contains only UTF-8 characters - UTF-8 is an encoding of Unicode, so, this just means Unicode again I'm guessing actually C# can write something that's not Unicode in a String for some reason? But what that might be is unexplained: Can you... emit arbitrary bytes? But how when your native encoding (UTF-16) isn't even byte oriented? What does that mean? Maybe you can emit the rare Unicode "non-characters" like U+FFFF ? But, you can express those just fine in UTF-8 so who cares? Or perhaps it's as simple as C# lets you write literals which are sequences of 16-bit code units but aren't UTF-16 ? |
> The language will allow conversions between string constants and byte sequences where the text is converted into the equivalent UTF8 byte representation. Specifically the compiler will allow string_constant_to_UTF8_byte_representation_conversion - implicit conversions from string constants to byte[], Span<byte>, and ReadOnlySpan<byte>. A new bullet point will be added to the implicit conversions §10.2 section. This conversion is not a standard conversion §10.4.
> When the input text for the conversion is a malformed UTF16 string then the language will emit an error: