Hacker News new | ask | show | jobs
by daakus 1909 days ago
Safety for me is confidence to use the thing. For me in my own code, but also others on my team that may work on this code.

I mostly have experience building things in GC languages. But with Rust I managed to safely use [1]:

- stack references in threads

- kept mmap references alive until threads finish work

- zero copy xml parsing (from mmaped data!)

- SSE/AVX enabled searching

The Rust language empowered me to do these things with a high degree of confidence. Not one segfault or core dump, just lots of compiler errors.

I played with Zig. Admittedly, the small ecosystem aspect is something all languages go thru, and it would be a better experience with a Zig specific libraries. But Zig doesn't empower library authors to make a large category of bugs impossible, and leaves it to documentation. This is like C, I don't have enough confidence in myself to use it.

Brilliant people are building powerful, safe-ish, reusable libraries in Rust. For mere mortals like me, this is Awesome.

[1]: https://gist.github.com/daaku/58557e2545612df8f40b13b66b7d3b...

1 comments

Hi, author of the aho-corasick crate here. Your use of it piqued my interest and caused me to take a closer look.

I believe your use of `unsafe` on this line is unsound: https://gist.github.com/daaku/58557e2545612df8f40b13b66b7d3b...

Namely, there is no guarantee that the bytes between `<page>` and `</page>` will be valid UTF-8. It may be the case that you only run this program with UTF-8 input, in which case, UB is never triggered. But it's worth pointing out here since there is nothing actually stopping your program from hitting UB.

Also, as long as you're bringing in the twoway crate, you might as well use it on lines 43 and 48 since you're just searching for a single needle.

The bytes are assumed to be utf8 (I was using the safer `from_utf8` prior to confirming the data was utf8).

I brought in `twoway` when I couldn't find a way to `rfind` using `aho-corasick`. I'll switch the use over for consistency.

Thanks for the quick code review!

PS: Thanks for ripgrep too!

Ah gotya. Yeah, I haven't added reverse searching to aho-corasick yet. Ran out of steam.

Either way, my point here is to be a counter-balance. To be fair, you did say, "But with Rust I managed to safely use." But the code you posted is technically unsound. It's not a huge deal if you know you'll always be feeding the program valid UTF-8. But it is worth mentioning here in this HN thread that is specifically comparing the safety properties of competing programming languages. :-)

Correct and fair. Updated the code to remove the safety issue.
Thank you. :-)