The reason the URL parser work is taking long is not because it's complex, rather it's because it's stalled. URL parsing is complex, however all this complexity was already dealt with when the Servo team wrote the rust-url crate ages ago, so it's not a factor here.
The URL parser integration was a proof of concept. It doesn't really improve stuff (aside from a slight security benefit from using Rust) so there wasn't pressure to land it; it was just a way of trying out the then-new Rust integration infra, and inspiring better Rust integration infra.
One of the folks on the network team started it, and I joined in later. But that person got busy and I started working on Stylo. So that code exists, and it works, but there's still work to be done to enable it, and not much impetus to do this work.
This work is mostly:
- Ferreting out where Gecko and Servo don't match so that we can pass all tests. We've done most of this already, whatever's left is Gecko not matching the spec, and we need to figure out how we want to fix that.
- Performance -- In the integration we currently do some stupid stuff wrt serialization and other things; because it was a proof of concept. This will need to be polished up so we don't regress
- Telemetry -- before shipping we need to ship it to nightly in parallel with the existing one and figure out how often there's a mismatch with the normal parser
As someone who once tried to write code to do it to avoid pulling in a dependency.
Never again, it's not just that the spec is 60 pages long but that the actual behaviour out in the real world is miles away from the spec, the web is a complex place where standards are...rarely standard.
URLs have been a security issue for browsers in the past, and can get pretty hairy. From UTF-8 coded domain names to whatever you want to "urlencode". For example, you can encode whole images into URLs, for embedding them in CSS files.
Old IE versions had a hard URL length limit and were very picky with the characters in domain names, both limitations included as "security fixes" (which broke the standards).
I'd say the change of the encoding stack to encoding-rs is pretty significant; while it's not that much code it's stuff that gets used throughout the codebase.
https://wiki.mozilla.org/Oxidation#Rust_components_in_Firefo...
Completed:
In progress: