Hacker News new | ask | show | jobs
by rspeer 2758 days ago
UTF-7 is "fun" because encoding libraries tend to support it, but since nobody cares about it, edge cases in the implementation may go undiscovered for a while.

Back on Python 2.7.5, the UTF-7 decoder didn't do range checking, so this script [1] produced a "Unicode string" containing the codepoint U+DEADBEEF. (The maximum valid codepoint is U+10FFFF.) This string would crash regexes, corrupt databases, etc., so that allowed denial-of-service attacks against any function that let you specify an arbitrary encoding.

(This is fixed in all extant versions of Python.)

[1] https://gist.github.com/rspeer/7559750