| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by zerodensity 1064 days ago

> For a start, String objects are going to be different everywhere. Even in C++, one library's String object isn't going to be binary compatible with another. How is the data laid out, does it do small string optimisation, etc? Are there other internal fields?

I don't really think anyone expects a c abi to have multiple implementation defined string types. They want there to be a pointer + length string interface removing the use of null pointer style strings alltogether.

> If this new style ABI returns a 'string' (pointer and length) of some sort, you have to package it up in your own object format

A c function with proper error, (that is something you want to have for all your interface functions). Normally looks something like this.

int name(T1 param_1, T2 param_2, ..., TN param_n, R1* return_1, R2* return_2, ..., RN* return_n);

Where the return int is the error code. param_1-param_n the input parameters. result_1-result_n the results of the function.

When writing these kinds of functions having an extra parameter for the size of the strings either for input or output is not a huge complexity increase.

> Will you need an extra object type to represent 'string I got from an ABI, whose memory is managed differently'?

Which memory management system you use does not impact if you use null terminated strings or a pointer + length pair. Both support stack, manual, managed or gc memory. It's just about the string representation.

For example:

I use a gc language.

I call a c library which returns a string that I get ownership of.

Now I want to leverage the gc to automatically free the string at some point. What I do is tell the gc how to free it, I have to do this no matter how the string is represented.

Or take the inverse.

I send in a string to the c library, which takes ownership of it.

Now the library must know how to free the memory. Typically this is done by allocating it with a library allocator (which can be malloc) before sending it to the function. Importantly the allocator is not the same as the one we use for everything else.

What I am getting at is that if you are not using the same memory system in the caller and the calle you have to marshal between them always. No matter if you are using null terminated strings or a pointer + length pair.

2 comments

kazinator 1064 days ago

> pointer + length string interface

If it's a 32 bit length, that will be limiting for some 64 bit programs.

If it's a 64 bit length, it means tiny strings take up more space.

Hey, do both! Have the length be a "size_t" and then have "compat_32" shim around single system call that takes at least one string argument.

Wee!

Imagine a parallel world in which mainstream OS kernel developers had seen the light 30 years ago and used len + data for system calls. You'd now have to be support ancient binary programs that are passing strings where the length is uint16. Oh right, I forgot! We can just screw programs that are more than five years old. All the cool users are on the latest version of everything.

> if you are not using the same memory system in the caller and the calle you have to marshal between them always. No matter if you are using null terminated strings or a pointer + length pair.

Null-terminated byte strings are always marshaled and ready to be sent literally anywhere. They have no byte order issues. No multi-byte length field whose size and endianness we have to know. If they are UTF-8, their character encoding is already marshaled also (that's the point of using UTF-8 everywhere).