| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by delta_p_delta_x 119 days ago

> zero copy and zero allocations

This is a red herring, because when you actually read the strings out, you still need to iterate through the length for each string—zero copy, zero allocation, but linear complexity.

> query file size, allocate buffer once, read it into the buffer, drop some NULL's into strategic positions, maybe shuffle some bytes around for that rare escape case, and you have a whole bunch of C strings, ready to use, and with no length limits.

I write parsers in a very different way—I keep the file buffer around as read-only until the end of the pipeline, prepare string views into the buffer, and pipe those along to the next step.

2 comments

theamk 119 days ago

I don't see what's "red herring" about it - for a reasonable format, any parsing will normally be O(n) complexity, so all we can do is to decrease constant factor.

So _today_ I write parsers in a very different way as well, copying strings is very cheap (today) and not worth it extra complexity.

But remember we are talking about the past, when those conventions are being established. And back in the 90's, zero copy and zero allocations were real advantage. Not in the theoretical CS sense, but in very practical - remember there was _no_ "dynamically resizing vector" in C's (or Pascal's) stdlib, it's just raw malloc() and realloc(), and it is up to you to assemble vector from it as needed. And free()/malloc() overhead was non-trivial, you had to re-use and grow the buffer as needed. And you want to store the parsed data, storing separate length would double your index size! So a parse-in-place + null-terminated strings approach would give you both smaller code and smaller runtime, at the expense of a few sharp corners. But we were all running with scissors back then.

link

dh2022 119 days ago

I think the concern was conserving memory ( which was scarce back then) and not iterating through each substring.

link

delta_p_delta_x 119 days ago

I am very sceptical about that. Much safer and cleaner languages like ML and Lisp were contemporary to C, and were equally developed on memory-scarce hardware.

link

kelnos 119 days ago

They were also comparatively slow, no? And their runtimes used up much more of that scarce memory than a C program did.

link

theamk 119 days ago

Maybe on the high-end machines in some fancy lab somewhere?

All I saw were 386's and 486's, and I am pretty sure every piece of software I ever used was either C or Turbo Pascal or direct assembly. In the mid-90s, Java appeared and I remember how horribly slow those Java apps were compared to C/Pascal code.

link

priceishere 119 days ago

But does it even conserve memory? Copying a string when you have the length is 2 bytes of machine code on x86 (rep movsb).

Remember, code takes up memory too.

link