so...are we trying to micro-optimize the standard library?
There will be bottlenecks narrower (is that the right term?) than string operation in any project not dealing exclusively with strings. In the latter case, rolling your own string utilities may make sense
Of course you shouldn't optimize unless it's the biggest bottleneck and more performance (or battery life) is desired.
C string operations might have been elegant 40 years ago, but nowadays they're like rerouting a 747 to check if office lights are on.
Strlen is doing a lot of work for little benefit. It does slow and power hungry memory accesses. Because it's scanning for terminating 0x00 byte, it needs to contain a branch -- and loop terminating branch must be a costly mis-predicted one.
C printf and friends are even more insane. It scans "bytecode" instructions from a string and does dynamic formatting. You can do format options like this:
printf("% 0#*.*f\n", 15, 5, 1.234);
Or say print five chars of an unterminated string, left padded to total length of 10:
printf("%10.5s\n", unterminated_str);
It won't (at least it shouldn't) crash even if 6th character is on an unreadable memory page.
I don't think they are insane. It is safe if you follow some rules. Flag -Wformat=2 catches every mistake at compile time. (It wont guard against overflow of course, but this is C)
The last example is explicitly defined in the Standard.
The performance cost is crazy when format string is parsed at runtime.
Implementing all that subtle functionality correctly takes up a lot of CPU time.
According to my quick test, on Visual Studio 2012, even simplest sprintf with just one parameter seemed to take about 1 microsecond to execute.
Of course clang and gcc seem to sometimes compile whole format parser away. At least...
printf ("Hello World!\n");
... is optimized into a simple "puts("Hello World!");".
Of course iostream << operator runtime performance is also pretty horrible. Each << invocation seem to call streambuf::sputn (or sputc) separately. On VS2012, simplest stringstream test...
ss << "value: " << intVal << endl;
... outputting one variable into it and turning the result into a std::string took about 3 microseconds. (Although it was a very quick test, a number of things might be suboptimal in the test code.)
There will be bottlenecks narrower (is that the right term?) than string operation in any project not dealing exclusively with strings. In the latter case, rolling your own string utilities may make sense