| I don't understand your complaints. You clearly have some task you have in mind that you wish to perform: why not tell me what it is? > Please show a code example of changing European to African in this sentence in your language of choice, working on the bytes in any multi-byte encoding: מהי מהירות האווירית של סנונית ארופאית ללא משא? I don't see the string 'European' in that sentence, it seems to be solely comprised of Hebrew characters. edit to attempt to answer your question: struct m {
pos_t start;
pos_t end;
}
int findsn(char* str, char* substr, match m) {
next: for( int c_i = 0; c_i++; s[c_i] != '\0' ) {
match.start = c_i;
int s_i = 0;
for( ; s_i++; substr[s_i] != '\0' ) {
if( str[c_i] != substr[s_i] ) goto next;
}
match.end = c_i + s_i;
return 1;
}
return 0;
}
char* replacesn(char* str, char* needle, char* rpl) {
match m;
if( findsn(str, needle, &m) ) {
splicesn(str, m.start, m.end, rpl);
}
return str;
}
splicesn should be obvious, and you normalise your strings before calling replacesn. This is just me crappily re-implementing a fraction of the wchar API without checking MSDN.edit 2: > Is each application to maintain their own dictionary of code points? No, you use the system/standard library for composing/decomposing/normalising codepoints. > If the map is to be in a library, then why not have it in the language itself? Why not indeed? What a great idea. |
> Why not indeed? What a great idea.
It sounded to me that you were arguing that string manipulation functions do not need to be included in modern programming languages. You said: "don't decode to a string, and do all your character manipulation on the bytes"