Hacker News new | ask | show | jobs
by jneal 3440 days ago
This is pretty scary. When you hear security professionals explain to laymen how to identify phishing attacks, it's almost always check the URL, make sure you're actually at google.com and not go0gle.com, or something like that.

I can't even imagine what legitimate use there is to placing an entire HTML document into the URL. Just seems like a hack someone came up with as a solution to a problem, not the right solution, but a solution nonetheless.

2 comments

>I can't even imagine what legitimate use there is to placing an entire HTML document into the URL. Just seems like a hack someone came up with as a solution to a problem, not the right solution, but a solution nonetheless.

It allows you to embed data in an URL, meaning you can link to documents that aren't necessarily stored anywhere, such as generated images/text.

I suppose you could make an argument that it shouldn't be shown as a regular URL.

Why even render the content of data:text/html in the first place?
To give an example, I've seen some multiplayer games with dynamic content, that use Websockets for communication with the server and update various information via data URIs. I've never seen a text/html data URI yet (mostly image transmission to be honest) but for a multi-client Websockets type application I definitely wouldn't rule out that sort of thing.

I agree that blocking the rendering of data:text/html (and any other MIME type that could be used maliciously) from the address bar is a good idea. I can't think of a valid use case for that scenario. It seems like similar attack vectors have been known for some time (https://nakedsecurity.sophos.com/2012/08/31/phishing-without...).

Because that's precisely what the 'data:' URI is supposed to do. The URI is only a description of some resource, there's no reason one description should be treated differently than any other, unless it's actually pointing to a different resource.
Its more the idea of rendering the HTML code in this fashion does not make sense to me. Maybe print the code to the page instead of rendering it. Anything would be better than rendering the code; I can't even come up with a possible use case for that functionality, can you?
You could generate a webpage and link to it without needing to host it somewhere.

At any rate, if you allow a URI scheme that embeds the data in the URI itself it'd be very odd to arbitrarily restrict the valid MIME types. It'd be like forbidding a http URL from linking to a JPEG.

Well in a way you're just offloading the cost of hosting that code/data in that case. Instead of hosting it yourself, the page with the link is hosting that webpage.

Well it wouldn't really be arbitrary, it'd be specifically HTML and/or JS, for security related purposes.

Not to mention don't blobs allow us to do this now anyway?
make sure you're actually at google.com and not go0gle.com

And how about the domain with a character that looks more like 'o' than '0'? There was something on HN recently about that. The example given would have completely fooled me, since it looked the same as the real domain.

https://en.m.wikipedia.org/wiki/IDN_homograph_attack is what you're referencing, I believe :)
Interesting enough, HN itself was actually susceptible to this and it was reported by a security researcher:

https://news.ycombinator.com/security.html