Hacker News new | ask | show | jobs
Python urllib CRLF injection vulnerability (coocoor.com)
94 points by robin0 2653 days ago
9 comments

This is far from uncommon. Back in DEFCON 2017 Orange Tsai gave a talk about inconsistencies in different URL parsing libraries in different languages. The opening example was a single URL that had a different hostname when parsed by urllib, urllib2, and requests. He also demoed examples of using unusual characters like spaces and newlines to talk to Redis or SMTP while pretending to be HTTP.

Slides: https://media.defcon.org/DEF%20CON%2025/DEF%20CON%2025%20pre...

Orange actually reported this bug to urllib. The ticket in the HN link is actually a DUP of Orange's original finding
Man, that's a really good presentation.
Python urllib3 maintainer here. urllib3 made a change to be more RFC-compliant in December, and which fixed this issue, but that change has not been released yet. We are in the process of looking into that.

I have verified that Requests, which uses us, appears to have its own handling, back at least to requests 2.0 (released in 2013) that prevents this when used directly as an abstraction layer on top of urllib3.

Interesting. I was recently debating whether to use Requests or just urllib3 directly. Figured I'd minimize dependencies by just using urllib3 but didn't think it might actually be more secure to use Requests. Great work btw!
What is your use case that would make minimizing dependencies to this extreme a valuable activity?
I was just using urllib3 to post a form on another website and get the resulting html page, then parsed it with BeautifulSoup.

Since it was just a one off use case and ultimately very simple, I didn't see the need for any more functionality. Why bother with the extra packages? Or do you think it's still worthwhile to use Requests even still? Is it not just unnecessary bloat that might slow runtime?

There's a lot to unpack in your comment, but I'll just work with the most easily verifiable thing for you; what was the response time of the resource you were querying with urllib3, and do you think using requests instead of urllib3 directly would be an order of magnitude (or two) more or less runtime?
I admittedly didn't test the response times between the two, but just felt adding additional dependencies was unnecessary. I don't realistically expect the speed to be too different between the two, but the less I have to rely on external libraries the better. If I can get the job done with urllib3 why use Requests?

Though admittedly, after reading OPs statement, I see that Requests might actually have some extra security that urllib3 alone might not have. But barring security improvements or the need for extra features that Requests has, seems like using Requests for my usecase would be adding unnecessary complexity.

Not the OP, but deploying code on government servers that interact with the public web means that minimizing the required modules saves you piles of paperwork and meetings. I'd rather spend the extra two days writing/testing my own code then filling out paperwork and waiting weeks
Fair enough in general, but IMO Requests really is worth it.
The link should probably be changed to the actual bug: https://bugs.python.org/issue36276
Which appears to be a duplicate of another bug filed in 2017: https://bugs.python.org/issue30458
I just noticed that. I guess they didn't think it was actually an exploitable bug?

Edit: this bug sat around for almost 2 years, it will be interesting to see if it gets fixed now that it is getting attention on Hacker News

Relevant (and super cool) previous work, done by Orange Tsai: https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-Ne...
Python 3 urllib and other stdlib protocol modules also use `splitlines` which splits on various unicode "newlines". Could that also be exploitable somehow? https://discuss.python.org/t/changing-str-splitlines-to-matc...
Key takeaway: don't expect a library to do the safe thing; always sanitize all your input. (If your language supports taint mode, enabling it can prevent these bugs)
Does anyone know if this also affects the Requests library? Does it use these under the hood, or is it all httplib? (I'm pretty sure that's the case)
it probably does. Requests is built on top of urllib3 and the bug report mentions that urllib3 is affected as well.
Are urllib and urllib3 same thing?
No, urllib is a standard library module in python 3. urllib3 is a 3rd-party package. See also https://news.ycombinator.com/item?id=19423367
seems like an ad for coocoor

actual CVE entry: http://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-9740

Probably worth checking other implementations. The comments already mention that urllib3 is affected as well.
I wouldn't be surprised if there are other libraries in other languages that also have the same bug.

golang had the same bug which was fixed in this commit: https://github.com/golang/go/commit/829c5df58694b3345cb5ea41...

Why does python need three different versions of urllib?
Urllib2 introduced breaking changes with urllib so a new lib was added to preserve the functionality of the old one. Urllib3 also has breaking changes but it purposely doesn't live in the standard so it can be changed more readily.
urllib and urllib2 are in the stdlib of Python2, and neither of them has a very friendly interface. They have been consolidated to just urllib in Python3.

urllib3 is a 3rd-party library that powers requests. It tries to offer a more powerful set of features behind a better interface.