|
|
|
|
|
by inferiorhuman
2316 days ago
|
|
The Rust version probably could be made to work at an equivalent speed with enough effort. But at a high-level, Go was much more enjoyable to work with. This is a side project and it has to be fun for me to work on it. The Rust version was actively un-fun for me, both because of all of the workarounds that got in the way and because of the extremely slow compile times. Obviously you can tell from the nature of this project that I value fast build times :) Was the Rust parser written by hand or did you use one of the parser frameworks (e.g. nom or pest) out there? nom, for instance, goes to great lengths to be zero-copy which would probably be a big benefit here. |
|
I assume by zero-copy you mean that identifiers in the AST are slices of the input file instead of copies? I was also careful to do this in both the Go and Rust versions. It's somewhat complicated because some JavaScript identifiers can technically have escape sequences (e.g. "\u0061bc" is the identifier "abc"), which require dynamic memory allocation anyway. See "allocatedNames" in the current parser for how this is handled.
Note that strings aren't slices of the input file because JavaScript strings are UTF-16, not UTF-8, and can have unpaired surrogates. So I represent string contents as arrays of 16-bit integers instead of 8-bit slices (in both Go and Rust).
In the past I tried using WTF-8 encoding (https://simonsapin.github.io/wtf-8/) for string contents, since that can both represent slices of the input file while also handling unpaired surrogates, but I ended up removing it because it complicated certain optimizations. I think the main issue was having to reason through weird edge cases such as constant folding of string addition when two unpaired surrogates are joined together. I think it's still possible to do this but I'm not sure how much of a win it is.