|
|
|
|
|
by Animats
3464 days ago
|
|
I looked at a few. "itoa" is clearly premature optimization. That uses an old hack appropriate to machines where integer divide was really expensive, like an Arduino-class CPU. It's unlikely to help much on anything with a modern divide unit. "httpparse", "idna", "serde-json", and "inflate" should be made 100% safe - they all take external input, are used in web-facing programs, and are classic attack vectors. Not much use of number-crunching libraries; that reflects what you do. I'll look at some more later. How to deal effectively with incoming UTF-8, especially bad UTF-8, may need some thinking. |
|
"itoa" is code that is copied directly from the Rust core library. Every character of unsafe code is identical to what literally everybody who uses Rust is already running (including people using no_std). Anybody who has printed an integer in Rust has run the same unsafe code. It is some of the most widely used code in Rust. If I had rewritten any of it, even using entirely safe code, it would be astronomically more likely to be wrong than copying the existing code from Rust. The readme contains a link to the exact commit and block of code from which it is copied.
As for premature optimization, nope it was driven by a very standard (across many languages) set of benchmarks: https://github.com/serde-rs/json-benchmark
"serde_json" uses an unsafe assumption that a slice of bytes is valid UTF-8 in two places. This is either for performance or for maintainability, depending on how you look at it. Performance is the more obvious reason but in fact we could get all the same speed just by duplicating most of the code in the crate. We support deserializing JSON from bytes or from a UTF-8 string, and we support serializing JSON to bytes or to a UTF-8 string. Currently these both go through the same code path (dealing with bytes) with an unchecked conversion in two important spots to handle the UTF-8 string case. One of those cases takes advantage of the assumption that if the user gave us a &str, they are guaranteeing it is valid UTF-8. The other case is taking advantage of the knowledge that JSON output generated by us is valid UTF-8 (which is checked along the way as it is produced).
Here again, both of those uses are driven by the benchmarks in the repo above and account for a substantial performance improvement over a checked conversion.