| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by throwawaymaths 1086 days ago
	Not only that, but some have solved it while maintaining compatibility with null terminated strings. Null terminated strings, after all, are sometimes more efficient.

2 comments

zerodensity 1086 days ago

Can't think of a single case where null terminated strings are more efficient. Could you give some examples?

link

slaymaker1907 1086 days ago

It takes 4-8 bytes to represent the size of the string versus 1 byte for a null terminator. That doubles the size of the string when you embed it in a struct or pass it as an argument on the stack. In particular, remember that even today cache lines are only 64 bytes for x86-64 and while that seems like a lot, going from 64 bytes to 68 means you go from 1 cache miss to load some struct to 2 cache misses.

link

GrumpySloth 1086 days ago

Same as going from 64 to 65.

link

throwawaymaths 1086 days ago

> Can't think of a single case where null terminated strings are more efficient

https://lemire.me/blog/2020/09/03/sentinels-can-be-faster/?a...

link

umanwizard 1086 days ago

If you have to walk the string anyway, the null terminator has no downside.

link

zerodensity 1086 days ago

According to the bible (https://www.agner.org/optimize/) it's faster to use a loop with length than walking though a pointer so not having a length will make it slower to walk the string whole also making things like simd optimizations harder for the compiler to do.

link

throwawaymaths 1086 days ago

That doesn't make sense. If you have loop with length you have to check both the content of the byte and the index; if you have null terminated strings you only check the content of the byte.

link

GrumpySloth 1086 days ago

When you have the length, you can unroll the loop, so that you e.g. do 4 iterations at a time. With NUL you can’t do that. Moreover, loop iteration can be done in parallel (instruction-level parallelism) with processing the content of the string, since there is no data dependency between the two. With NUL you introduce a data dependency.

link

throwawaymaths 1086 days ago

You can't always do those things. Yes, pointer length is almost always faster. But it's not always faster.

https://lemire.me/blog/2020/09/03/sentinels-can-be-faster/?a...

link

kevin_thibedeau 1086 days ago

You can modify the length of NUL strings in place by inserting a new NUL or overwriting past the end without any other bookkeeping. You can split a string on delimiters simply by overwriting them with NULs.

link

jrpelkonen 1086 days ago

I find these arguments rather weak. I don’t see how writing a NUL is any more efficient compared to updating a length. Furthermore, having the terminators in-band prevent the character data to be used for multiple substrings. E.g. in the split example the original string is no longer available.

link

2h 1086 days ago

this ignores the fact that strings can contain a null character. for example, this is a valid Go program:

    package main
    
    import "fmt"
    
    func main() {
       fmt.Printf("%q\n", "hello \x00 world")
    }

link

throwawaymaths 1086 days ago

I'm not sure what your point is. We're talking about c null terminated strings, not go.

link