Hacker News new | ask | show | jobs
by hinkley 2723 days ago
This seems specious to me. The only way to get an invalid index in a string in any language is that you either have an array index arithmetic error or you are blindly operating on a string you haven't validated.

If you want all the data after a : character, you slice on the index of the :. The character after it is going to be the beginning of a UTF-8 character.

You do not under any circumstances guess that the colon is at position 6 in the string. That's not safe. Why are you going cowboy in a language that is so obsessed with safety?

2 comments

I just realized that I have bug in my GPS driver. It operates on ASCII data, so [] operator is safe, BUT data can be corrupted (low chance, but non-zero), so it can form valid multibyte character, so my code will panic on it, trying to parse and validate NMEA message.
Panicking on parsing corrupted data seem like a feature to me...

It's like the default rule in a lexer, if it ever gets to it then it's an unrecognized character and lexing stops so error handling can proceed.

--edit--

Which I now realize was probably your point.

Truncating a string to fit in a fixed-size storage field is probably the most common reason to split at a particular byte position. If you’re throwing data away anyway, you probably don’t care too much about the little bit of corruption.

Granted, this is certainly incorrect but has little to do with safety, especially if the downstream code has to revalidate everything anyway.