| 1) On malloc() and free(), right, I was free just to
write my own. I should have. At various times
since for various reasons, I have just written my
own. On your "K&R and other good C references describe their
public interface well and that's all you need to
know to use them effectively." I want more. By analogy, all you need to drive a
car is what you see sitting behind the steering
wheel, but I also very much want to know what is
under the hood. Generally I concluded that for 'effective' 'ease of
use', writing efficient code, diagnosing problems,
etc., I want to know what is going on at least one
level deeper than the level at which I am making the
most usage. Your example of putting a 100,000 byte array on the
stack is an example: Without knowing some about
what is going on one level deeper, that seems to be
an okay thing to do. 2) My remark about the stack is either not quite
correct or is not being interpreted as I intended.
For putting an array on a push down stack of
storage, I am fully aware of the issues. But on a
'stack', maybe also the one used for such array
allocations (that PL/I called 'automatic'; I'm not
sure there is any corresponding terminology in C),
there is also the arguments passed to functions. It
seemed that this stack size had to be requested via
the linkage editor, and if too little space was
requested then just the argument lists needed for
calling functions could cause a 'stack overflow'. A
problem was, it was not clear how much space the
argument lists took up. Then there was the issue of passing an array by
value. As I recall, that meant that the array would
be copied to the same stack as the arguments. Then
one array of 100,000 bytes could easily swamp any
other uses of the stack for passing argument lists. But even without passing big 'aggregates' by value
or allocating big aggregates as 'automatic' storage
in functions, there were dark threats, difficult to
analyze or circumvent, of stack overflow. To write
reliable software, I want to know more, to be able
to estimate what resources I am using and when I
might be reaching some limit. In the case of the
stack allocated by the linkage editor for argument
lists, I didn't have that information. 3) Sure, I could make use of the strings in C as C
intended just as you state, just for textural data,
but also have to assume a single byte character set. I thought that that design of strings was too
limited for no good reason. That is, with just a
slightly different design, could have strings that
would work for text with a single byte character set
along with a big basket of other data types. That's
what was done in Fortran, PL/I, Visual Basic .NET,
and string packages people wrote for C. The situation is similar to what you said about
malloc(): All C provided for strings was just a
pointer to some storage; all the rest of the string
functionality was just in some functions, some of
which, but not all, needed the null termination.
So, what I did with C strings was just use the
functions provided that didn't need the null
terminations or write my own little such functions. As I mentioned, I didn't struggle with null
terminated strings; instead right from the start I
saw them as just absurd and refused ever to assume
that there was a null except in the case when I was
given such a string, say, from reading the command
line. It has appeared that null terminated strings have
been one of the causes of buffer overflow malware.
To me, expecting that a null would be just where C
wanted it to be was asking too much for reliable
computing. 3) On casts, we seem not to be communicating well. Data conversions are important, often crucial. As I
recall in C, the usual way to ask for a conversion
is to ask for a 'cast'. Fine: The strong typing
police are pleased, and I don't mind. And at times
the 'strongly typed pointers' did save me from some
errors. But the question remained: Exactly how are the
conversions done? That is, for the set D of
'element' data types -- strings, bytes,
single/double precision integers, single/double
precision binary floating point, maybe decimal,
fixed and/or floating, and for any distinct a, b in
D, say if there is a conversion from a to b and if
so what are the details on how it works? One reason to omit this from K&R would have been
that the conversion details were machine dependent,
e.g., depended on being on a 12, 16, 24, 32, 48, or
64 bit computer, signed magnitude, 2's complement,
etc. Still, whatever the reasons, I was pushed into
writing little test cases to get details, especially
on likely 'boundary cases', of how the conversions
were done. Not good. Sure, this means that I am a sucker for using a
language closely tied some particular hardware. So
far, fine with me: Microsoft documents their
software heavily for x86, 32 or 64 bits, from Intel
or AMD, and now a 3.0 GHz or so 8 core AMD processor
costs less than $200. So I don't mind being tied to
x86. On PL/I: Thankfully, no, it was not nearly the
first language I learned. Why thankfully? Because
the versions I learned were huge languages. Before
PL/I I had used Basic, Fortran, and Algol. PL/I was a nice example of language design in the
'golden age' of language design, the 1960s. You
would likely understand PL/I quickly. So, PL/I borrowed nesting from Algol, structures
from Cobol, arrays and more from Fortran,
exceptional condition handling from some themes in
operating system design, threading (that it called
'tasking' -- current 'threads' are 'lighter in
weight' than the 'tasks' were -- e.g., with 'tasks'
all storage allocation was 'task-relative' and was
freed when the task ended), and enough in bit
manipulation to eliminate most uses of assembler in
applications programming. It had some rather nice
character I/O and some nice binary I/O for, say,
tape. It tried to have some data base I/O, but that
was before RDBMS and SQL. In the source code, subroutines (or functions) could
be nested, and then there were some nice scope of
name rules. C does that but with only one level of
nesting; PL/I permitted essentially arbitrary levels
of nesting which at times was darned nice. Arrays could have several dimensions, and the upper
bound and lower bound of each could be any 16 bit
integers as long as the lower was <= the upper -- 32
bit integers would have been nicer, and now 64 bit
integers. Such array addressing is simple: Just
calculate the 'virtual origin', that is, the address
of the array component with all the subscripts 0,
even if that location is out in the backyard
somewhere, and then calculate all the actual
component addresses starting with the virtual origin
and largely forgetting about the bounds unless have
bounds checking turned on. Nice. A structure was, first-cut, much like a struct in C,
that is, an ordered list of possibly distinct data
types, except each 'component' could also be a
structure so that really was writing out a tree.
Then each node in that tree could be an array. So,
could have arrays of structures of arrays of
structures. Darned useful. Easy to write out,
read, understand, and use. And dirt simple to
implement just with a slight tweak to ordinary array
addressing. So, it was just an 'aggregate', still
all in essentially contiguous, sequential storage.
So, there was no attempt to have parts of the
structure scattered around in storage. E.g., doing
a binary de/serialize was easy. The only tricky
part was the same as in C: What to do about how to
document the alignment of some element data types on
certain address range boundaries. Each aggregate has a 'dope vector' as I described.
So, what was in an argument list was a pointer to
the dope vector, and it was like a C struct with
details on array upper and lower bounds, a pointer
to the actual storage, etc. PL/I had some popularity -- Multics was written in
it. For C, PL/I was solid before C was designed. So, C
borrowed too little from what was well known when C
was designed. Why? The usual reason given was that
C was designed to permit a single pass compiler on a
DEC mini-computer with just 8 KB of main memory and
no virtual memory. IBM's PL/I needed a 64 KB
360/30. But there were later versions of PL/I that
were nice subsets. It appears that C caught on because DEC's mini
computers were comparatively cheap and really
popular in technical departments in universities;
Unix was essentially free; and C came with Unix. So
a lot of students learned C in college. Then as PCs
got going, the main compiled programming language
used was just C. Big advantages of C were (1) it had pointers crucial
for system programming, (2) needed only a relatively
simple compiler, (3) had an open source compiler
from Bell Labs, and (4) was so simple that the
compiled code could be used in embedded
applications, that is, needed next to nothing from
an operating system. The C pointer syntax alone is fine. The difficulty
is the syntax of how pointers are used or implied
elsewhere in the language. Some aspects of the
syntax are so, to borrow from K&R, 'idiosyncratic'
that some examples are puzzle problems where I have
to get out K&R and review. To me, such puzzle problems are not good. I will give just one example of C syntax: i = j+++++k; Right: Add 1 to k; add that k to j and assign the
result to i; then add one to j. Semi-, pseudo-,
quasi-great. I won't write code like that, and in my startup I
don't want us using a language that permits code
like that. |
My personal view is that C offers a perfect tradeoff between simplicity and capability, it has a magical quality that has made it the most important single computer language for nearly 40 years and on into the forseeable future. Increasingly its importance is as a layer that more programmer friendly technology sits upon, but it's no less important for that.
I've read that the difference between chess and go (the oriental game, not golang) is that on Alpha Centauri if little green men play a game that resembles chess, they will almost certainly play a game identical to go. Go is simple enough it is almost inevitable. For me it's almost the same thing (I stress almost), with computer languages and C.
One final point; C syntax is ultimately a matter of taste. If you find this to be a completely obvious, correct and straightforward way of doing a non-overlapping C string copy;
Then you 'get' C. If you find it a confusing monstrosity, maybe C isn't your language.