Hacker News new | ask | show | jobs
by MilStdJunkie 693 days ago
Holy smokes. I'm no programmer, but I've built out bazillions of publishing/conversion/analysis systems, and null purge is pretty much the first thing that happens, every time. x00 breaks virtually everything just by existing - like, seriously, markup with one of these damn things will make the rest of the system choke and die as soon as it looks at it. Numpy? Pytorch? XSL? Doesn't matter. cough cough cough GACK

And my systems are all halfass, and I don't really know what I'm doing. I can't imagine actual real professionals letting that moulder its way downstream. Maybe their stuff is just way more complex and amazing than I can possibly imagine.

2 comments

Binary files are full of null bytes it is one of the main criteria of binary file recognition. Also large swaths of null bytes are also common, common enough we have sparse files - files with holes in them. Those holes are all zeroes, but are not allocated in the file system. For an easy example think about a disk image.
Ah, I'm a dummy. Of course these would be binaries getting chucked around.
Not a C programmer, why is 0x00 so bad? It's the string terminator character right?
It's a byte like any other. You're more likely to see big files full of 0x0 than 0x1, but it's really not so different.
Indeed, '\0' is the sentinel character for uncounted strings in C, and even if your own counted string implementation is “null-byte clean”, aspects of the underlying system may not be (Windows and Unix filesystems forbid embedded null characters, for example).