| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tfm 3485 days ago

Inasmuch as there is blame to be apportioned in this case, it's due to JavaScript / ECMAScript having broad definitions of acceptable variable names, and (arguably[0]) the fact that browser JS implementations will generally accept arbitrary 8-bit data within multiline comments, rather than the strict Unicode code units specified by ECMAScript.

JPEG comments exist for the same reason that EXIF tags exist – it's handy to store metadata alongside the image data, it gets copied around when the file gets copied, the tags can be transferred if the image gets re-encoded. There are enough error recovery mechanisms built into browsers that one could likely make a polyglot by just abusing the data segment, maybe even while crafting a legitimate standards-compliant JPEG.

Ultimately, bytes are bytes! Interpreting them with a variety of content types can give a variety of results, so keep it in mind.

[0] Resynchronisation / recovery from bit errors is one of the explicit motivations behind the design of Unicode encodings, so the browsers get a pass from me on this one. It's almost certainly possible to craft a suitable JPEG using legitimate code points anyway.