Hacker News new | ask | show | jobs
by azornathogron 1075 days ago
URLs are structured. But when you need to send them across the network or store them on disk or even just send them between different processes on the same machine you need to define what the byte level representation is.

I don't see how you can get away from having a defined serialisation format. People try to operate directly on the serialised data using ad-hoc implementations and run into trouble.

But I'm not sure exactly what you mean by "should have been structured". Eventually you've gotta define the bytes if you want to interoperate with other software.

1 comments

> I don't see how you can get away from having a defined serialisation format.

Yep, that's exactly it. Your TLS certificate is not sent as string, and neither are your TCP packets, nor the images contained in them. Your URLs shouldn't be either, but it's probably too late for that.

> People try to operate directly on the serialised data using ad-hoc implementations and run into trouble.

That's a whole lot better than the current footgun we have, where

    http://http://http://@http://http://?http://#http://
is a valid URL. People don't operate directly on string URLs without trouble either, so at least the structured data is not inviting incorrect usage.
> > I don't see how you can get away from having a defined serialisation format.

> Yep, that's exactly it. Your TLS certificate is not sent as string, and neither are your TCP packets, nor the images contained in them.

...all of those things mentioned have defined serialization. i expect all of them have had security issues because of problems with deserialization code.

Yes, of course. Everything that is stored or transmitted must have a defined serialization. And any piece of code as widely used as this is going to have security issues.

What is your point? That strings don't need defined formats? That they have less security issues?

Your certificate isn't entered by hand, though?

That is, it is easy to see that the reason we have URLs sent as strings, is that we collect them from the user. And it makes perfect sense that we would collect strings of characters from users.

How many URLs, as a percent of all browser navigation, do you think are typed by hand? And I don't mean "news.ycombinator.com", I mean the full URL, like "https://news.ycombinator.com/news".

And in those rare cases, of course you can collect strings from the user. But then they have to be parsed, and that's what should be on the wire. IP addresses are also sometimes entered by hand, but we don't send those strings in TCP packets.

Fewer today than when it started, for sure. Though, I'm not clear that "copy pasted between applications" doesn't have its own problems. I have never seen that done in a "you are passing objects around" way that didn't have terrible security.