Sure, most of these decisions are too entrenched to be fixed.
But yes, URLs should have been structured. We already see paths rendered with breadcrumbs, the protocol replaced with an icon, `www` auto-inserted and hidden, and the domain highlighted. If that's not a structure, I don't know what is.
By cramming everything into the same string, we open ourselves to phishing attacks by domains like `www.google.com.evil.com`, malicious traversal, 404s from mangled relative paths, and much more.
URLs are structured. But when you need to send them across the network or store them on disk or even just send them between different processes on the same machine you need to define what the byte level representation is.
I don't see how you can get away from having a defined serialisation format. People try to operate directly on the serialised data using ad-hoc implementations and run into trouble.
But I'm not sure exactly what you mean by "should have been structured". Eventually you've gotta define the bytes if you want to interoperate with other software.
> I don't see how you can get away from having a defined serialisation format.
Yep, that's exactly it. Your TLS certificate is not sent as string, and neither are your TCP packets, nor the images contained in them. Your URLs shouldn't be either, but it's probably too late for that.
> People try to operate directly on the serialised data using ad-hoc implementations and run into trouble.
That's a whole lot better than the current footgun we have, where
> > I don't see how you can get away from having a defined serialisation format.
> Yep, that's exactly it. Your TLS certificate is not sent as string, and neither are your TCP packets, nor the images contained in them.
...all of those things mentioned have defined serialization. i expect all of them have had security issues because of problems with deserialization code.
Yes, of course. Everything that is stored or transmitted must have a defined serialization. And any piece of code as widely used as this is going to have security issues.
What is your point? That strings don't need defined formats? That they have less security issues?
That is, it is easy to see that the reason we have URLs sent as strings, is that we collect them from the user. And it makes perfect sense that we would collect strings of characters from users.
How many URLs, as a percent of all browser navigation, do you think are typed by hand? And I don't mean "news.ycombinator.com", I mean the full URL, like "https://news.ycombinator.com/news".
And in those rare cases, of course you can collect strings from the user. But then they have to be parsed, and that's what should be on the wire. IP addresses are also sometimes entered by hand, but we don't send those strings in TCP packets.
Fewer today than when it started, for sure. Though, I'm not clear that "copy pasted between applications" doesn't have its own problems. I have never seen that done in a "you are passing objects around" way that didn't have terrible security.
But yes, URLs should have been structured. We already see paths rendered with breadcrumbs, the protocol replaced with an icon, `www` auto-inserted and hidden, and the domain highlighted. If that's not a structure, I don't know what is.
By cramming everything into the same string, we open ourselves to phishing attacks by domains like `www.google.com.evil.com`, malicious traversal, 404s from mangled relative paths, and much more.