Hacker News new | ask | show | jobs
by nabla9 2251 days ago
> no such feature has emerged in practice

Arrays with length constantly emerge among C users and libraries. They are just all incompatible because without standardization there is no convergence.

4 comments

I think the problem is that C is simply ill-suited for these "high level" constructs. The best you're likely to get is an ad-hoc special library like for wchar_t and wcslen and friends. Do we really want that?

I'd argue that linked list might make a better candidate for inclusions, because I've seen the kernel's list.h or similar implementations in many projects and that's stuff is trickier to get right than stuffing a pointer and a size_t in a struct.

Sounds like a good use of standardization. If there is existing implementation practice, please go ahead and submit a proposal. I would be happy to champion such a proposal if you can't attend in person.
It was an observation, not suggestion.

When the language standardization body has not managed to add arrays with length in 48 years, I don't think it should be added at this point. The culture is backward looking and incompatible with modern needs and people involved are old and incompatible with the future (no offense, so am I).

C standardization effort should focus on finishing the language, not developing it to match modern world. I have programmed with C over 20 years, since I was a teenager. It's has long been the system programming language I'm most familiar with. For the last 10 years I have never written an executable. Just short callable functions from other languages. Python, Java, Common Lisp, Matlab, and 'horrors or horrors' C++.

I think Standard C's can live next 50 years in gradual decline as portable assembler called from other languages and compilation target.

If I would propose new extension to C language, I would propose completely new language that can be optionally compiled into C and works side by side with old C code.

> If I would propose new extension to C language, I would propose completely new language that can be optionally compiled into C and works side by side with old C code.

There are a few somewhat popular languages that fit that description already, and none of them are suitable replacements for C (as far as I've seen). That's not to say there couldn't be a suitable replacement -- just that nobody in a position to do something about it wants the suitable replacement enough for it to have emerged, apparently.

I suspect the first really suitable complete replacement for C would be something like what Checked C [1] tried to be, but a little more ambitious and willing to include wholly new (but perhaps backward-compatible) features (like some of those you've proposed) implemented in an interestingly new enough way to warrant a whole new compile-to-C implementation. Something like that could greatly improve the use cases where a true C replacement would be most appreciated, and still fit "naturally" into environments where C is already the implementation language of choice via a piecemeal replacement strategy where the first step is just using the new language's compiler as the project compiler front end's drop-in replacement (without having to make any changes to the code at all for this first step).

1: https://www.microsoft.com/en-us/research/project/checked-c/

Sounds like you are describing Zig. https://ziglang.org
I haven't looked at Zig too closely yet (only started just a few minutes ago), but it immediately appears to me that this violates one of the requirements I suggested, as demonstrated by this use-case wish from my previous comment:

> > using the new language's compiler as the project compiler front end's drop-in replacement (without having to make any changes to the code at all for this first step)

I'll look into Zig more, though. Maybe I'll like it.

---

I stand corrected, given my phrasing. I should have specified that it needs to also support incrementally adding the new language's features while most of the code is still unaltered C, rather than (for instance) having to suddenly replace all the includes and function prototypes just because you want to add (in the case of Zig) an error "catch" clause.

You can use the Zig compiler to compile C with no modifications, and easily call C from Zig or Zig from C, so I'm not sure what more you're hoping for. A language that allows you to mix standard C and "improved C" in the same file sounds like a mess to me.
typedef struct {uint8_t *data; size_t len;} ByteBuf; is the first line of code I write in a C project.
Could you add some extra information why this is so helpful or handy to have? Think it will benefit readers that are starting out with C etc.
In C, dynamically-sized vectors don’t carry around size information with them, often leading to bugs. This struct attempts to keep the two together.
Memory corruption in sudo password feedback code happened because length and pointer sit as unrelated variables and have to be manipulated by two separate statements every time like some kind of manually inlined function. For comparison putty slice API handles slice as a whole object in a single statement keeping length and pointer consistent.
Another option is a struct with a FAM at the end.

  typedef struct {
      size_t len;
      uint8_t data[];
  } ByteBuf;
Then, allocation becomes

  ByteBuf *b = malloc(sizeof(*b) + sizeof(uint8_t) * array_size);
  b->len = array_size;
and data is no longer a pointer.
Well, your ByteBuf is still a pointer. You also now need to dereference it to get the length. It also can't be passed by value, since it's very big. You can also not have multiple ByteBufs pointing at subsections of the same region of memory.

Thing is, you rarely want to share just a buffer anyway. You probably have additional state, locks, etc. So what I do is embed my ByteBuf directly into another structure, which then owns it completely:

    typedef struct {
        ...
        ByteBuf mybuffer;
        ...
    } SomeThing;
So we end up with the same amount of pointers (1), but with some unique advantages.
Right, totally depends on what you're doing. My example is not a good fit for intrusive use cases.
sizeof(ByteBuf) == sizeof(size_t), and you can pass it by value; I just don't think you can do anything useful with it because it'll chop off the data.
This will an alignment problem on any platform with data types larger than size_t. You'd need an alignas(max_align_t) on the struct. At which point some people are going to be unhappy about the wasteful padding on a memory constrained target.
Why not typedef struct {uint8_t *data, dataend} ?

Makes it easier to take subranges out of it

should be

  typedef struct {uint8_t *data, *dataend} 
if I'm not mistaken :)
What are the advantages of saving the end as a pointer? Genuinely curious. Seems like a length allows the end pointer to be quickly calculated (data + len), while being more useful for comparisons, etc.
You can remove the first k elements of a view with data += k.

With the length you would need to do data += k; length -= k

Especially if you want to use it as safe iterator, you can do data++ in a loop

> ...You can remove the first k elements of a view with data += k.

How would you safely free(data) afterwards? You'd need to keep an alloc'ed pointer somehow.

Got it. That is really neat, going to add to my bag of tricks...
Right. I always think the pointer declaration is part of the type. (that is why I do not use C. Is there really a good reason for this C syntax?)
That's a really bizarre layout for your struct. Why don't you put the length first?
Why would it matter? The bytes aren't inline, this is just a struct with two word-sized fields.

A possible tiny advantage for this layout is that a pointer to this struct can be used as a pointer to a pointer-to-bytes, without having to adjust it. Although i'm not sure that's not undefined behaviour.

I don't think that's undefined behavior. That's how C's limited form of polymorphism is utilized. For example, many data structures behind dynamic languages are implemented in this way. A concrete example would be Python's PyObject which share PyObject_HEAD.

https://github.com/python/cpython/blob/master/Include/object...

I'm not sure if it matters. It might be better for some technical reason, such as speeding up double dereferences, because you don't need to add anything to get to the pointer. But to be honest I just copied it out of existing code.
Most platforms have instructions for dereferencing with a displacement.
The "existing practice" qualification refers to existing compiler extensions I'd guess. Then lobbying about the feature should be addressed to eg LLVM and GCC developers.